Branden,
> Incidentally there is a bit of a muddle here as your original point in the > bug report seems to be solely about ~ and ^, whereas Ingo's secondment > sweeps up the other ASCII characters without identity mappings as well. > I'm specifically referring to ~ and ^. Though I agree with Ingo's sentiments concerning hyphens and directional single-quotes, I consider those to be in the *"too late to fix"* basket. Admittedly, I don't understand why ^ and ~ are deserving of special typesetting treatment. Unlike quotes and dashes, they aren't fundamental elements of English orthography. I find the wrangling of ^ and ~ to be equally jarring in PDF output as well; if I were to solicit a change to Groff's behaviour, it would be suppress the mangling of ^ and ~, forcing users to request a modifier character specifically if they desire one. And probably 95% or more of groff users are doing so via a package of some > sort prepared by a distribution vendor like Debian GNU/Linux, OpenBSD, > Fedora, or some other intermediary between "upstream" (us) and themselves. > > That is why I said "If every *nix vendor in the world seizes upon the > above and adds it, I can view it with equanimity." Who's to say every intermediary will share the same opinion about tampering with man.local? Homebrew <https://brew.sh/>, for example, has a strict policy about patching <https://docs.brew.sh/Formula-Cookbook#patches> software, meaning there's zero chance of your suggested amendment reaching macOS users. I don't think man pages should have to be written one way for terminals and > another for PDF > I wholeheartedly agree, which is why I believe we should abolish the hell out of Groff's “special” treatment of ^ and ~. They don't appear frequently enough in Latin-based writing systems to justify an exception to Groff's character handling rules (whereas dashes and directional quotes do) "grout" is my shorthand for "device-independent output produced by GNU > troff" > I've given in and taken to calling it "ditroff" informally, even though I know damn well that it's a misappropriation. > . > On Sat, 28 May 2022 at 08:51, G. Branden Robinson < g.branden.robin...@gmail.com> wrote: > Hi Johm, > > At 2022-05-27T11:04:52+1000, John Gardner wrote: > > > I have no problem adding an item to the PROBLEMS file with a chunk > > > of groff source that people can put in their site "man.local" or > > > "troffrc" files to achieve the ASCII-degradation of the five glyphs > > > that novice man page writers abuse so copiously. > > > > Can we *please* be practical about this? > > I'm trying to be. > > Incidentally there is a bit of a muddle here as your original point in > the bug report seems to be solely about ~ and ^, whereas Ingo's > secondment sweeps up the other ASCII characters without identity > mappings as well. > > > 90% of Groff users, if not more, are only doing so via man(1) to read > > man pages. > > Yes. And probably 95% or more of groff users are doing so via a > package of some sort prepared by a distribution vendor like Debian > GNU/Linux, OpenBSD, Fedora, or some other intermediary between > "upstream" (us) and themselves. > > That is why I said "If every *nix vendor in the world seizes upon the > above and adds it, I can view it with equanimity."[1] > > > Many of whom are probably oblivious to the existence of a typesetting > > system underneath that's powering it all. They won't care about local > > configuration, they'll just be annoyed that there's another bunch of > > annoying characters they need to replace in anything copy+pasted from > > a terminal. Think Stack Overflow posts containing ˆ and ˜ by hapless > > users unaware that a regex or path they just copied contain what're > > essentially diacritics without a character. > > True; people will attempt copy and paste from PDF files as well. That's > why I want to prevail upon man page authors to choose correct glyphs in > their documents--so we can get a consistent experience on all output > devices. I discussed this with Michael Kerrisk, the co-maintainer of > the Linux man-pages project (Alejandro's counterpart) almost a year and > a half ago[2]. He's been doing that job a long time and was not > alarmed. > > > Which reminds me: *these characters were designed to be overstruck*. A > > + ˆ = Â, A + ˜ = Ã. > > In ASCII? Yes, except for the hyphen, originally they were--if they > weren't replaced by some national character set's alternative glyphs. > This incidentally includes the neutral double quote ("), which is why it > looks so funny on Teletype Model 37 output (attached). > > When the C/A/T showed up at the Murray Hill Unix Room, some of these > input characters were given (potentially) overstrikable semantics. The > text ("standard") fonts had both a hyphen glyph and a minus glyph, so as > I say in groff_char(7), a decision had to be taken which one got mapped > to plain '-' and which one was going to need an escape sequence. > Similarly, ` and ' became entrenched as directional single quotes, and > their backslash-prefixed forms became accent marks. The C/A/T's > standard fonts didn't have distinct high-flown ^ and ~ glyphs. They > appeared only in the AT&T-specified "special font", where, as far as my > eyes can tell, they are drawn entirely above the cap-height of the > standard fonts. > > See the image (from the 1976 edition of CSTR #54) attached to comment #3 > of <https://savannah.gnu.org/bugs/?42473>. > > EMCA-6 (ISO 646) muddied the waters a little bit. But since both ^ and > ~ were replaceable code points, I suppose people didn't kick up too much > of a fuss. > > Unicode 1.0 (October 1991) further stirred the mud; "ASCII" ^ was > recognized as a high, small glyph that certainly _looks_ overstrikable, > and ASCII ~ was permitted to be overstrikable or not! See attachment. > > Unicdoe 2.0 (July 1996) finally got off the pot and decided upon "big", > spacing semantics for (what was now termed) Basic Latin ^ and ~. See > attachment. It would be another four years before Unicode really > started to penetrate to *nix terminal environments, with support > arriving thanks in no small measure to the efforts of Markus Kuhn.[3] > > With conflicting and unstable traditions, it is no wonder that there is > confusion around this issue. groff has _mostly_ been consistent > throughout its history as to the semantics of these characters. An > exception is that in January 2009, groff's man(7) and mdoc(7) were > patched to map all of -, \-, ', and ` to Basic Latin code points. > > > https://git.savannah.gnu.org/cgit/groff.git/commit/?id=98acc924f4e32cfc2209df5db0c21921df8cc7ac > > If I had been around at the time to utter ominous warnings much as you > are, I'd have beseeched Werner to put the above code into troffrc (with > some kind of guard like '.if d TH') or man.local and mdoc.local and put > a comment above it saying that it should be removed by people who wanted > to undertake fixing the many wrong extant man pages, who didn't mind > those pages' misrendering, or whose systems' man pages had been > corrected in some tolerable proportion. > > In my view it was a stopgap measure that should have been advertised as > such. (With the exception of \- going to \N'45', because we simply > _don't have_ in *roff an input character--ordinary or special--that > means "the hyphen-minus, yes, THAT one, the root of all misery".) > > > In a PDF or PostScript document, or with a hardware teletype, this > > sort of composition is easy. In a modern terminal environment, not so > > much. They're not making typesetting any better, they're only making > > user experience worse. > > I don't think this is squarely on point. It's not particularly hard to > type "\[a aa]" or "\['a]", let alone the more portable "\('a". There > are some of the *roff-esque ways to achieve character composition > (others are discussed in groff_char(7)). > > a^H' was a good way to get an a-with-acute-accent on a Model 37 but > people generally don't compose characters that way anymore. Dead keys > (common on European keyboards), 3- and 4-level keyboard layouts, and > "input methods" are all more common. > > > Now, we can deplore the state of man page authorship as much as we > > like, but the truth is that most software authors won't see this as a > > problem on their end, > > To the extent that's true, man pages will continue to suck. As long as > man page authorship is conducted by people who refuse to read or learn, > their documentary output will tend to be of poor quality, because such a > mindset is a severe hindrance to excellent technical writing. However, > my hope is that such people are a minority, even if a noisy one. > > Even so, we can acknowledge that the *roff language's syntax is, in > Kernighan's term, "rebarbative" (CSTR #97, I think). That is why I feel > it is fair to document transition mechanisms like the one I've pushed > today, why I have striven to document these matters as thoroughly and > conscientiously as I can, and why I am willing to undertake, as I said > in the message to which you replied, the preparation of patches for > automated generators of man(7) output that may be unmaintained and/or > whose maintainers are unreceptive to changes. Some such people may > indeed view this as the last straw, flip man(7) the bird, and decamp for > Markdown, which always just Does What You Mean (right?[5]). > > > or with end user configuration. They'll see this as a regression > > in the latest version of Groff and will file bug reports accordingly. > > I'm prepared for that, but so too should our distributors be, so I've > added a 'NEWS' item and updated the existing 'PROBLEMS' item (which > dates back to July 2003). > > > https://git.savannah.gnu.org/cgit/groff.git/commit/?id=915a878038236769eb072f728389352c1da88719 > > > If you still decide to go ahead: Don't say I didn't warn you. > > I'm warned. > > Regards, > Branden > > [1] https://lists.gnu.org/archive/html/groff/2022-05/msg00052.html > [2] > https://lore.kernel.org/all/a1af3f5c-f3e9-4bf3-cad5-389571c45...@gmail.com/T/#m8282cb95b86db994508ece3165340e0075c3871d > [3] https://www.cl.cam.ac.uk/~mgk25/unicode.html > [4] https://cygwin.com/pipermail/cygwin/2002-October/085349.html is an > example that will live in infamy. > [5] > https://docs.racket-lang.org/pollen/second-tutorial.html#%28part._the-case-against-markdown%29 >