[bug #68133] [troff] nail down semantics of character translations involving non-characters

G. Branden Robinson Thu, 30 Apr 2026 22:17:17 -0700

Follow-up Comment #8, bug #68133 (group groff):

[comment #7 comment #7:]
> [comment #6 comment #6:]
>> Have you seen bug #68132?
>> 
>> That's an example of a translation that "historically worked"--except in
>> groff for the last 37 years.
> 
> Yes, that would be another category.
> 
>> Semantic clarity, consistency, and corresponding reductions in the
>> number of corner cases that have to be documented, tested, and
>> maintained...while we trade most of the latter advantages away
>> again for things we support only in compatibility mode, such
>> features are at least firewalled off.
> 
> You know the code and test suite better than I do, of course.  But on the
> surface, "we have to maintain/test additional corner cases A, B, and C"
> sounds like less work than "we have to maintain/test that corner cases A, B,
> and C have result D under condition E and result F under condition G."


But how do I know, today, how many corner cases I'm going to need?  And how
are people even going to know that corner cases are being exercised when
they're formatting a few hundred pages of legacy Unix documents, nothing
_dramatically_ bad happens (like a formatter crash or major document
truncation), without carefully going over the formatted output in one hand,
the *roff source in the other, and substantial *roff expertise in the brain
between?

If I make GNU _troff_ not accept as a `tr` operand anything that doesn't have
"charinfo" (see below), then I can have it spew a diagnostic when an oddball
is encountered.  The corner case can then be studied, and if it's a case of a
historical substance, we can _then_ write support for it in compatibility mode
and unit-test it.

That way _thought_ is given to the corner cases, instead of the absence of
thought.  "Are we translating a character to a font selection escape sequence?
 Does the formatter understand what we're trying to do?  Does it comply?  Does
the result work?  Who knows?" 

>> Not just under the hood.  Our Texinfo manual is at pains to
>> distinguish characters from glyphs
> 
> Perhaps my terminology was sloppy.  Characters are input and glyphs are
> output, which is not what I was trying to get at.

No, we inescapably hit a trichotomy.

Yes, GNU _troff_ reads 8-bit bytes, and every possible one of the 256 values
is accounted for and handled in one way or another.

And yes, GNU _troff_ spits out "glyphs" that an output device looks up in its
font descriptions, or accesses by index.
[https://man7.org/linux/man-pages/man5/groff_out.5.html The "c", "C", and "N"
trout commands handle these.]

But in between--internally to the formatter--ahh, that's a different story.
Every unique _character_, whether ordinary, special, or indexed, an associated
[https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/charinfo.h?h=1.24.1
charinfo object].

> My intended point was it's easier for a user to conceptualize "I can
> translate a character to another character represented by an escape sequence"
> than "I can translate a character to some but not to other characters
> represented by an escape sequence."

I'm sure it is, but no _troff_ has ever been **that** simple.


$ printf '.tr o\\fB\nHello, world!\n' | dwb nroff | cat -s
Hell , w rld!

$ printf '.tr o\\fB\nHello, world!\n' | solaris10 nroff | cat -s
Hell , w rld!

$ printf '.tr o\\s+2\nHello, world!\n' | solaris10 nroff | cat -s
Hell , w rld!

$ printf '.tr o\\x@2p@\nHello, world!\n' | solaris10 nroff | cat -s
Hell, wrld!

$ printf '.tr o\\x@2p@\nHello, world!\n' | dwb nroff | cat -s
Hell, wrld!

$ printf '.tr o\\x@2p@\nHello, world!\n' | 9 nroff | cat -s
Hell, wrld!

$ printf '.tr o\\x@2p@\nHello, world!\n' | heirloom nroff | cat -s
Hell\, w\rld!


When you translate a character to an escape sequence that does not represent a
character, you might get an unbreakable space, you might get nothing, or you
might get a backslash.  And things "that are ignored in nroff mode" might not
be so ignored after all.

>> Why do you suppose that a user of a typesetting system is likely to be
>> able to achieve a high level of skill in that system without a clear
>> idea of what its definition of a "character" is?
> 
> I expect Dennis Ritchie was a pretty skilled troffer and still used a syntax
> you're advocating to disallow.

I expect he was.  But I'll bet that Kernighan was even more skilled,
especially by the time he'd finished refactoring Ossanna troff into
device-independent troff.  (See CSTR #97.)

I think some _troff_ features were not by deliberate design, but deliberate
hacks that arose as users explored the input space.

And those hacks that Ritchie employed in *roff documents, I expect to support,
if it's not more than a slight headache to do (modulated by the impact on
rendered output).

But only in compatibility mode.

There's no need for the coming generations of *roff users (insert laughter or
tears here) to become conversant with old hacks where GNU _troff_, or even
AT&T _troff_ itself, offers a workable alternative.
  
>> If I get this stuff hammered out the way I envision, that use case will
>> be served as follows.
> ...
>> Why is the foregoing insufficient, especially for something that has
>> never worked before?
> 
> (for groff values of "never")

Fully conceded.
 
> The foregoing is sufficient, but seems a little extra hoop-jumpy than
> necessary.  Granted, jumping through hoops is a price of admission to many
> aspects of roff, but maybe that price can be reduced sometimes.

I hope I've thrown some light on the conceptual clarity and consistency we can
enjoy if we stick to the principle that _features_ are more important for
composability than _syntax_.

[1] See footnote 4 of
<https://lists.gnu.org/archive/html/groff/2026-03/msg00039.html>.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?68133>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #68133] [troff] nail down semantics of character translations involving non-characters

Reply via email to