Follow-up Comment #4, bug #67347 (group groff): Hi Ingo,
Thanks for the follow-up and the research using your big mdoc(7) corpus! At 2025-07-28T08:55:50-0400, Ingo Schwarze wrote: > Follow-up Comment #3, bug #67347 (group groff): > > gbranden@ "thought bubbled": > >> The existing `\)` escape sequence has only one user: groff _mdoc_. > > Since it has been documented at the end of > https://www.gnu.org/software/groff/manual/groff.html.node/Dummy-Characters.html > , some user documents that are not manual pages might use it, though. > Not sure how likely that is, maybe not very. Right. By "one user", I meant "in the groff source tree". The escape sequence may indeed have users in the wild, but like you, I wonder if it's being used effectively or for a well-understood purpose. >> I don't think the mdoc language ever encourages its users to employ >> this escape sequence, in other words to put `\)` into their >> documents. > > Certainly not. I did not recall seeing a manual page using \). Nor do I. > The roff(7) manual page from the mandoc package strongly discourages > using this escape sequence not only once, but in two different ways: > > 1. ESCAPE SEQUENCE REFERENCE > The mandoc(1) roff parser recognises the following escape > sequences. In mdoc(7) and man(7) documents, using escape > sequences is discouraged except for those described in the > LANGUAGE SYNTAX section above. > > The LANGUAGE SYNTAX section does not mention \). > > 2. \) Zero-width space transparent to end-of-sentence detection; > ignored by mandoc(1). Acknowledged. > I checked the following manual page collections with grep(1) for use > of \): > > 1. My private collection of manual pages that have caused trouble in > the past. > The only one in that collection that used \) is the old, now > deleted groffer(1) page. > It contained this code inside a (horrendous) macro definition: > . ds @pre \)\\$1\)\" prefix > . ds @sep \)\\$2\)\" separator > . ds @post \)\\$3\)\" postfix Hard to say what the author had in mind. This looks analogous to an AT&T troff technique (still useful sometimes, and employed several places in the groff tree) documented in our Texinfo manual. The dummy character escape sequence sees use in macro definitions as a means of ensuring that arguments are treated as text even if they begin with spaces or control characters. .de HD \" typeset a simple bold heading . sp . ft B \&\\$1 \" exercise: remove the \& . ft . sp .. .HD .\|.\|.\|surprised? My *guess* is that someone thought "if \& is good, \) must be better!". But I'd say, "not if it delivers no marginal advantage where employed". > 2. OpenBSD base system and Xenocara manuals: no match > 3. The ports manual pages i currently have installed: no match > (That's only a tiny fraction of the ports tree, though.) > 4. Linux man pages project: no match > 5. FreeBSD 14.2: > krb5_fileformats(3) contains this text line: > Quoted principal (quote character is \) [string] > https://man.bsd.lv/FreeBSD-14.2/krb5_fileformats.3 > https://man.freebsd.org/cgi/man.cgi?query=krb5_fileformats > Looks like a simple escaping mistake to me. No other match. > 6. NetBSD 10.1 has several matches: > man1/groff.1:. nop \)\$* > man5/groff_out.5:. nop \)\$* > man5/groff_tmac.5:. nop \)\\$*\) > man5/groff_tmac.5:. Text .\~nop\~\[rs])\[rs]\[rs]$*[rs]\) > man5/tmac.5:. nop \)\\$*\) > man5/tmac.5:. Text .\~nop\~\[rs])\[rs]\[rs]$*[rs]\) > man7/groff.7:. nop \)\$* > man7/groff_trace.7:. nop \)\\$*\) > man7/roff.7:. nop \)\$* > Looks like \) was (slightly) more widely used in some old version > of groff. Really old! Looks like Werner took some of that stuff out a while ago. commit d4f2bf4035901d32bd2d281da79cb91e4d375937 Author: Werner LEMBERG <[email protected]> Date: Sat Jan 19 20:50:34 2008 +0000 [...] * man/groff_font.man, man/groff_tmac.man, man/roff.man: Revised. > IIUC, you intend to change the semantics of \) from "zero-width space > transparent to end of sentence detection" to "freeze the current end > of sentence detection status until the end of the word". Right. > I cannot say for sure how likely it is that that might cause problems > for some existing general-purpose typesetting documents; maybe not > very likely, for the following reason: if somebody put a > non-EOS-transparent character right after \), the \) had no effect > that i can see, so why did they use \) at all? Agreed. One can inter\)\)\)\)sperse one's in\)\)\)\)\)put with arbitrary runs of such escape seq\)\)\)\)uences, but to what benefit? > I consider it very unlikely that the change will cause serious trouble > in manual pages, both because manual pages should not contain this > particular escape sequence, and even if they do, not only does the > above general-purpose typesetting argument apply, but the worst > possible consequence i can imagine is end of sentence detection > changing value after a word containing \). And there are already problems, in man(7) documents at least, with the absence of a means of freezing end-of-sentence status causing incorrect detection. I'll have to scare up the example I have in mind, but it has to do with using .UR and .UE. [minutes later] I'm not spotting the problem in groff's man page corpus, at least not in the "See also" section where I seem to recall seeing it. Maybe I recast my way around the problem. But I will take this opportunity note a related one. End-of-sentence detection is a property of the environment, but is not introspectable; in other words, there's no register exposing its value. (The debugging `pev` request also doesn't dump it to stderr.) So if an environment ends without breaking the line, end-of-sentence status can be lost. > That won't make the manual page unintelligible - at worst it might > look ugly in a very minor way. Agreed. > If i understand correctly, the two main semantic changes are: > > 1. Right now, "x.\)x" does not end a sentence. After the change, it > will. (But what was the point of writing that code?) > 2. Right now, "x\)x." ends a sentence. After the change, it no > longer will. (But what was the point of writing that code?) > > I believe item 1 is the whole point why you are considering this, so > that incompatibility can't be avoided. Yes. > If you worry about item 2, you can instead consider this semantics: > > 1. If the current end of sentence status is "yes", \) propagates that > status to the end of the word. > 2. If the current end of sentence status is "no", \) has no effect. > > Maybe before committing to the semantic change, you should research > who introduced \) when, and whether they provided a rationale. It appears to go "all the way back", or practically so. https://cgit.git.savannah.gnu.org/cgit/groff.git/tree/troff/input.c?h=1.02#n782 Unfortunately James Clark doesn't respond to emails about _groff_. I reckon I will have to try to discover its purpose from _groff_'s mdoc(7) implementation--that is the only place I know of where the feature is used with apparent intelligence. (My first stab at the problem would probably be to stash a rendered copy of groff_mdoc(7), rip all the `\)`s out of the package, re-render that man page, and see what--if anything--changed.) If the changed semantics can buy the same advantage, or if `\)` is already delivering no value to _groff mdoc_, then my proposed change could be a win. Incidentally, I'm not considering implementing any change in this area for _groff_ 1.24.0. _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?67347> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/
signature.asc
Description: PGP signature
