Follow-up Comment #4, bug #67347 (group groff):

Hi Ingo,

Thanks for the follow-up and the research using your big mdoc(7) corpus!

At 2025-07-28T08:55:50-0400, Ingo Schwarze wrote:
> Follow-up Comment #3, bug #67347 (group groff):
>
> gbranden@ "thought bubbled":
>
>> The existing `\)` escape sequence has only one user: groff _mdoc_.
>
> Since it has been documented at the end of
> https://www.gnu.org/software/groff/manual/groff.html.node/Dummy-Characters.html
> , some user documents that are not manual pages might use it, though.
> Not sure how likely that is, maybe not very.

Right.  By "one user", I meant "in the groff source tree".

The escape sequence may indeed have users in the wild, but like you, I
wonder if it's being used effectively or for a well-understood purpose.

>> I don't think the mdoc language ever encourages its users to employ
>> this escape sequence, in other words to put `\)` into their
>> documents.
>
> Certainly not.  I did not recall seeing a manual page using \).

Nor do I.

> The roff(7) manual page from the mandoc package strongly discourages
> using this escape sequence not only once, but in two different ways:
>
> 1. ESCAPE SEQUENCE REFERENCE
> The mandoc(1) roff parser recognises the following escape
> sequences.  In mdoc(7) and man(7) documents, using escape
> sequences is discouraged except for those described in the
> LANGUAGE SYNTAX section above.
>
> The LANGUAGE SYNTAX section does not mention \).
>
> 2. \) Zero-width space transparent to end-of-sentence detection;
> ignored by mandoc(1).

Acknowledged.

> I checked the following manual page collections with grep(1) for use
> of \):
>
> 1. My private collection of manual pages that have caused trouble in
> the past.
> The only one in that collection that used \) is the old, now
> deleted groffer(1) page.
> It contained this code inside a (horrendous) macro definition:
> .  ds @pre \)\\$1\)\"                   prefix
> .  ds @sep \)\\$2\)\"                   separator
> .  ds @post \)\\$3\)\"                  postfix

Hard to say what the author had in mind.  This looks analogous to an
AT&T troff technique (still useful sometimes, and employed several
places in the groff tree) documented in our Texinfo manual.


     The dummy character escape sequence sees use in macro definitions
     as a means of ensuring that arguments are treated as text even if
     they begin with spaces or control characters.

          .de HD \" typeset a simple bold heading
          .  sp
          .  ft B
          \&\\$1 \" exercise: remove the \&
          .  ft
          .  sp
          ..
          .HD .\|.\|.\|surprised?


My *guess* is that someone thought "if \& is good, \) must be better!".

But I'd say, "not if it delivers no marginal advantage where employed".

> 2. OpenBSD base system and Xenocara manuals: no match
> 3. The ports manual pages i currently have installed: no match
> (That's only a tiny fraction of the ports tree, though.)
> 4. Linux man pages project: no match
> 5. FreeBSD 14.2:
> krb5_fileformats(3) contains this text line:
> Quoted principal (quote character is \) [string]
> https://man.bsd.lv/FreeBSD-14.2/krb5_fileformats.3
> https://man.freebsd.org/cgi/man.cgi?query=krb5_fileformats
> Looks like a simple escaping mistake to me.  No other match.
> 6. NetBSD 10.1 has several matches:
> man1/groff.1:.  nop \)\$*
> man5/groff_out.5:.  nop \)\$*
> man5/groff_tmac.5:.  nop \)\\$*\)
> man5/groff_tmac.5:.  Text .\~nop\~\[rs])\[rs]\[rs]$*[rs]\)
> man5/tmac.5:.  nop \)\\$*\)
> man5/tmac.5:.  Text .\~nop\~\[rs])\[rs]\[rs]$*[rs]\)
> man7/groff.7:.  nop \)\$*
> man7/groff_trace.7:.  nop \)\\$*\)
> man7/roff.7:.  nop \)\$*
> Looks like \) was (slightly) more widely used in some old version
> of groff.

Really old!  Looks like Werner took some of that stuff out a while ago.

commit d4f2bf4035901d32bd2d281da79cb91e4d375937
Author: Werner LEMBERG <[email protected]>
Date:   Sat Jan 19 20:50:34 2008 +0000

[...]
    * man/groff_font.man, man/groff_tmac.man, man/roff.man: Revised.

> IIUC, you intend to change the semantics of \) from "zero-width space
> transparent to end of sentence detection" to "freeze the current end
> of sentence detection status until the end of the word".

Right.

> I cannot say for sure how likely it is that that might cause problems
> for some existing general-purpose typesetting documents; maybe not
> very likely, for the following reason: if somebody put a
> non-EOS-transparent character right after \), the \) had no effect
> that i can see, so why did they use \) at all?

Agreed.  One can inter\)\)\)\)sperse one's in\)\)\)\)\)put with
arbitrary runs of such escape seq\)\)\)\)uences, but to what benefit?

> I consider it very unlikely that the change will cause serious trouble
> in manual pages, both because manual pages should not contain this
> particular escape sequence, and even if they do, not only does the
> above general-purpose typesetting argument apply, but the worst
> possible consequence i can imagine is end of sentence detection
> changing value after a word containing \).

And there are already problems, in man(7) documents at least, with the
absence of a means of freezing end-of-sentence status causing incorrect
detection.  I'll have to scare up the example I have in mind, but it has
to do with using .UR and .UE.

[minutes later]

I'm not spotting the problem in groff's man page corpus, at least not in
the "See also" section where I seem to recall seeing it.  Maybe I recast
my way around the problem.

But I will take this opportunity note a related one.  End-of-sentence
detection is a property of the environment, but is not introspectable;
in other words, there's no register exposing its value.  (The debugging
`pev` request also doesn't dump it to stderr.)

So if an environment ends without breaking the line, end-of-sentence
status can be lost.

> That won't make the manual page unintelligible - at worst it might
> look ugly in a very minor way.

Agreed.

> If i understand correctly, the two main semantic changes are:
>
> 1. Right now, "x.\)x" does not end a sentence.  After the change, it
> will.  (But what was the point of writing that code?)
> 2. Right now, "x\)x." ends a sentence.  After the change, it no
> longer will.  (But what was the point of writing that code?)
>
> I believe item 1 is the whole point why you are considering this, so
> that incompatibility can't be avoided.

Yes.

> If you worry about item 2, you can instead consider this semantics:
>
> 1. If the current end of sentence status is "yes", \) propagates that
> status to the end of the word.
> 2. If the current end of sentence status is "no", \) has no effect.
>
> Maybe before committing to the semantic change, you should research
> who introduced \) when, and whether they provided a rationale.

It appears to go "all the way back", or practically so.

https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/input.c?h=1.02#n782

Unfortunately James Clark doesn't respond to emails about _groff_.

I reckon I will have to try to discover its purpose from _groff_'s
mdoc(7) implementation--that is the only place I know of where the
feature is used with apparent intelligence.  (My first stab at the
problem would probably be to stash a rendered copy of groff_mdoc(7), rip
all the `\)`s out of the package, re-render that man page, and see
what--if anything--changed.)

If the changed semantics can buy the same advantage, or if `\)` is
already delivering no value to _groff mdoc_, then my proposed change
could be a win.

Incidentally, I'm not considering implementing any change in this area
for _groff_ 1.24.0.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?67347>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature

Reply via email to