URL:
  <https://savannah.gnu.org/bugs/?67734>

                 Summary: [troff] ban C0 controls and Latin-1 supplement
characters from use in identifiers
                   Group: GNU roff
               Submitter: gbranden
               Submitted: Tue 25 Nov 2025 04:55:20 PM UTC
                Category: Core
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Tue 25 Nov 2025 04:55:20 PM UTC By: G. Branden Robinson <gbranden>
* The set of C0 control characters that are usable in input to name
identifiers differs between AT&T _troff_ and GNU _troff_, and always has.  For
that matter, Heirloom Doctools _troff_ differs from both.


$ for c in '\002' '\003' '\004' '\005' '\006' '\007'; do printf '.nr r'$c'
42\n.tm \\n(r'$c'\n' | dwb nroff >/dev/null; done
0
0
0
0
0
[you also get a terminal beep with DWB, apparently to stderr since stdout is
discarded]
$ for c in '\002' '\003' '\004' '\005' '\006' '\007'; do printf '.nr r'$c'
42\n.tm \\n(r'$c'\n' | 9 nroff >/dev/null; done
42
42
42
42
42
$ for c in '\002' '\003' '\004' '\005' '\006' '\007'; do printf '.nr r'$c'
42\n.tm \\n(r'$c'\n' | heirloom nroff >/dev/null; done
42
42

42
42
42
$ for c in '\002' '\003' '\004' '\005' '\006' '\007'; do printf '.nr r'$c'
42\n.tm \\n(r'$c'\n' | groff -z; done
42
42
42
42
42
42


* Use of Latin-1 Supplement code points (encoded in 8 bits) to name
identifiers is also not portable, and moreover erects a barrier to GNU
_troff_'s planned acceptance of UTF-8-encoded input.


$ printf '.nr \311l 57\n.tm \\n(\311l\n' | dwb nroff >/dev/null
57
$ printf '.nr \311l 57\n.tm \\n(\311l\n' | 9 nroff >/dev/null
$ printf '.nr \311l 57\n.tm \\n(\311l\n' | heirloom nroff >/dev/null

$ printf '.nr \311l 57\n.tm \\n(\311l\n' | groff -z
57


Explicitly unsupport use of these code points in identifiers to clear the way
for bug #40720.  (I have no plans to withdraw support for code points 2-7 as
input characters; many legacy documents use them as escape sequence
delimiters.)







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67734>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to