[bug #67711] [troff] `pchar` request reports stale character properties

G. Branden Robinson Mon, 17 Nov 2025 10:02:23 -0800

Update of bug #67711 (group groff):

                  Status:                    None => Need Info
             Assigned to:                    None => barx


    _______________________________________________________

Follow-up Comment #3:

[comment #1 comment #1:]
> pchar is telling the truth, because your input file doesn't change any
> properties of ":" itself.  It creates a class, then sets a flag on that
> class.

I disagree with this interpretation.

As I understand the code, a character class has only two properties (besides
its name and a memory address):

* It has a list (STL vector) of character code point ranges.  Any range can be
a singleton; that is, a lone code point--this is represented by having the
starting code point equal the ending one.  (Internally, an STL pair represents
the range's endpoints.)

* It has a list (STL vector) of nested classes, which are references to other
character class objects.  (Strictly, they're pointers to `charinfo` objects.
More strictly, I suspect they could be not pointers, but C++ references, which
are immutable and cannot be null, and avoid some kinds of memory management
problems.  GNU _troff_ offers no mechanism for precision editing of character
classes once defined.  You can't append to or delete from its ranges or nested
classes, or modify elements thereof.  If you want to modify the class, you
must clobber it and recreate it anew _ex nihilo_.)

Objects of the `charinfo` class are used to represent both individual
characters (be they ordinary, special, or indexed), and character classes.  I
suppose this is for convenience and/or to avoid the invention of new syntax
for dereferencing classes.

I think a good mental model of characters vs. character classes is a C/C++
`union` type.  A `charinfo` object has *either* the list of properties
associated with a "character", and which we now dump with the `pchar` request,
or it has the shorter list of properties associated with a "character
class"--but never both, with the lone exception that both have names
(represented by the _groff_ type `symbol`).

https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/charinfo.h?h=1.23.0#n28

Having said that, I'll make a concession.

* The foregoing conceptual model is unclear both in documentation and in the
type definition.  For example, a nested `union` is not used.  This is either a
telling fact against my interpretation, or an implementation oversight back in
the day.  A `union` is the way C and C++ represent a "variant record" when one
wants to be economical with memory.

> The usefulness of using a class in this manner is that you can add new flags
> to a character without clobbering whatever flags the character currently has
> set (which the language provides no way to query).

What evidence do you have to support this interpretation?  Specifically, where
in existing _groff_ macrology do you see this property leveraged?

[comment #2 comment #2:]
> [comment #1 comment #1:]
>> The usefulness of using a class in this manner is ...
> 
> I had suspected this was Werner's motivation for adding the .class request
> (in 2010 in [http://git.savannah.gnu.org/cgit/groff.git/commit/?id=1cb8dd7bd
> commit 1cb8dd7bd])--and it may be part of the motivation--but the stronger
> motivation appears to be summarized by the sentence he added to the manual
> (unchanged since then) about its usefulness for East Asian languages.

As I understand the East Asian language application, which appears to be the
only use case _groff_ actually manifests, the purpose of a character class is
to avoid major tedium (and large macro file lengths and quantities of time
required to interpret those macro files) arising from writing long, or many,
`cflags` requests to assign character flags to the large sets of characters in
CJK scripts.

I see no evidence _in the GNU_ troff _codebase_ to support your interpretation
that properties of character classes non-destructively override or layer over
properties of the individual characters that are members of the class.

All of that said, it appears to me that GNU _troff_'s behavior is perfectly
consistent with **neither** my model of character classes, nor yours.  For
example, my model is not supported by the exhibit in comment #0 of this
ticket.  I see no logic in the implementation of the `cflags` request that
queries whether one of its 2nd or later arguments is a character class and
recursively iterates through the `ranges` and `nested_classes` applying the
flags in the first argument.  (I've started working on implementing this.)
And I think, though I'm not sure, that your model is not consistent with the
behavior you're seeing in bug #67571, which explains why you're surprised by
that behavior and I'm not.

I suspect the feature is buggy or unfinished, and I am confident that it is
inadequately documented, or we wouldn't be having these conversations.

I think my interpretation is consistent with Werner's express motivation in
_groff_'s corresponding "NEWS" file entry at the time of initial commit in
2010.


The new `class' request assigns a short name to a set of characters
which can be referred to in the `cflags' request.  This is especially
useful to control line-breaking and hyphenation rules in CJK languages.


To me, "assigns a short name to a set of characters" strongly implies the
alias/thin container model I'm defending.  And the second sentence, I
interpret as a reference to `cflags` and `hcode` requests.  The way these
(theoretically) work on a character class is by internally simulating the
application of these requests recursively to all characters and character
classes in the class's "ranges" and "nested classes", sparing the macro
programmer the tedium of doing so.

Which model do you think we should go with?  Or is there another I haven't
thought of?


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67711>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

signature.asc
Description: PGP signature

[bug #67711] [troff] `pchar` request reports stale character properties

Reply via email to