On 07/03/2024, Dave Kemper wrote: > Hi Ian, thanks for your attention to the groff manual!
Thank you very much, Dave, for your helpful and informative replies. :-) > On 3/7/24, ropers <rop...@gmail.com> wrote: >> "latin1" sounds awfully ISO-8859-1ish, and (I fear) not very much like >> the Latin-1 Supplement Unicode block > > Correct. Since there are two different things that include "Latin-1" > in their name, perhaps this wording could be be more explicit. On the > other hand, the context is input encodings, and a Unicode block is not > itself an input encoding. It might be preferable to demine rather than rely on contextual hints as to the presence of UXO: $ diff -u groff.texi.orig groff.texi --- groff.texi.orig 2024-03-05 18:20:59.940460376 +0000 +++ groff.texi 2024-03-08 00:21:12.782360544 +0000 @@ -5509,9 +5509,10 @@ @cindex ISO @w{8859-1} (@w{Latin-1}), input encoding @cindex input encoding, @w{Latin-1} (ISO @w{8859-1}) @pindex latin1.tmac -ISO @w{Latin-1}, an encoding for Western European languages, is the -default input encoding on non-@acronym{EBCDIC} platforms; the file -@file{latin1.tmac} is loaded at startup. +ISO 8859-1, aka @w{Latin-1}, an extended ASCII encoding chiefly for +Western European languages, is still @code{groff}'s default input encoding on +non-@acronym{EBCDIC} platforms; the file @file{latin1.tmac} is loaded +at startup. @end table @noindent @@ -5533,9 +5534,9 @@ @cindex ISO @w{8859-2} (@w{Latin-2}), input encoding @cindex input encoding, @w{Latin-2} (ISO @w{8859-2}) @pindex latin2.tmac -To use ISO @w{Latin-2}, an encoding for Central and Eastern European -languages, invoke @w{@samp{.mso latin2.tmac}} at the beginning of your -document or supply @samp{-mlatin2} as a command-line argument to +To use ISO 8859-2, aka @w{Latin-2}, an encoding for Central and Eastern +European languages, invoke @w{@samp{.mso latin2.tmac}} at the beginning of +your document or supply @samp{-mlatin2} as a command-line argument to @code{groff}. @item latin5 @@ -5544,8 +5545,8 @@ @cindex ISO @w{8859-9} (@w{Latin-5}), input encoding @cindex input encoding, @w{Latin-5} (ISO @w{8859-9}) @pindex latin5.tmac -To use ISO @w{Latin-5}, an encoding for the Turkish language, invoke -@w{@samp{.mso latin5.tmac}} at the beginning of your document or +To use ISO 8859-5, aka @w{Latin-5}, an encoding for the Turkish language, +invoke @w{@samp{.mso latin5.tmac}} at the beginning of your document or supply @samp{-mlatin5} as a command-line argument to @code{groff}. @item latin9 @@ -5554,9 +5555,9 @@ @cindex ISO @w{8859-15} (@w{Latin-9}), input encoding @cindex input encoding, @w{Latin-9} (ISO @w{8859-15}) @pindex latin9.tmac -ISO @w{Latin-9} succeeds @w{Latin-1}; it includes a Euro sign and better -glyph coverage for French. To use this encoding, invoke @w{@samp{.mso -latin9.tmac}} at the beginning of your document or supply +ISO 8859-9, aka @w{Latin-9} succeeds @w{Latin-1}; it includes a Euro sign +and better glyph coverage for French. To use this encoding, invoke +@w{@samp{.mso latin9.tmac}} at the beginning of your document or supply @samp{-mlatin9} as a command-line argument to @code{groff}. @end table Внимание! I have not actually previewed this! Truth be told, info(1) is Greek to me. I've tried $ info groff.texi #, which made it say "Cannot find node 'Top'." at the bottom (pun intended?), and then I couldn't figure out how to actually view the groff info manual. Not that I've tried much, but still. IMNSHO it is incredibly ironic, and--if one could hurt a program's feelings--almost insulting for groff's manual to be maintained in info format. Not exactly dogfooding, no? At the peril of slighting the local champion, my opinions on info(1) reduce to <xkcd.com/912>, and I suspect $ info mcas is a synonym for $ kill -9 346 #, and in light of his prescience, I remain unconvinced *Primer* wasn't based on the exploits of one Randall Munroe + colleague. >> which makes me wonder if Current Year's >> groff/troff itself (absent pre-piped converters) can at all handle >> multi-byte character sets in general, or UTF-8 in particular. > > It cannot. This is a longstanding wishlist item: "improving Unicode > support" was put into the Groff Mission Statement when it was drafted > 10 years ago. Ten years before that, groff's then-maintainer posted > to this list: "Volunteers are highly welcome to extend groff from 8bit > to 32bit input characters" Based on my admittedly not quite unlimited insight into Unicode issues, if taken literally, a mission statement "to extend groff from 8bit to 32bit input characters" strikes me as an already outmoded if not stillborn strategy. It might be much better to go all-in on variable-width encoding, read: UTF-8, just like everybody else. Whatever limited *strictly internal* use there may still be for UTF-32 in some buffers, structs or variables, anything not UTF-8 is probably best kept to a minimum. But perhaps I'm barking at shadows here. Nothing in this <https://lists.gnu.org/r/groff/2004-05/msg00074.html> is smoking-gun evidence that would compel a jury of me, myself and I to conclude Werner et al. WEREN'T aware of that already, or if not then, then certainly now. > (http://lists.gnu.org/r/groff/2004-05/msg00026.html). > > But this is a monumental task, and one groff developer has written of > some of its difficulties > (http://savannah.gnu.org/bugs/?40720#comment4). I was a few paragraphs into that before I realised the author of the above comment is Ingo Schwarze, an OpenBSD dev I've previously talked to, and whose judgement on this I trust A LOT. > In short, it's not for lack of desire that groff lacks this feature. > > With any luck, you'll follow the Branden Track, where you start off by > poking a little at groff's documentation and are soon hacking away at > the code base. You might be the volunteer Werner asked for 20 years > ago ;-) Not to be a negative Nancy, but just to be straight with you and set expectations: Probably not. Even if I, at long last, might yet prove competent enough to make a significant contribution in code to the open source community, I am less likely to make that to a GNU GPL project -- I'm more of a BSD (ISC/OpenBSD) fan. Of course, to my understanding it's not BSD licenses that are incompatible with GPL ones, so any contribution could still reach you regardless of philosophical differences if not legalistic bikeshedding. I really only dove into the groff manual thanks to an observed (kernel.org) ascii(7) man page bug I only have a partial fix for, which is why I'm still reading, all of which I'll possibly talk about at a later date. >> Also, this sounds a lot like Current Year's groff(1) even WITH >> pipe-connected UTF-8 converters/drivers (which may be what's referred >> to at the bottom of that section) couldn't actually support anything >> like, say, Cyrillic or katakana or whatever, > > Groff added Cyrillic support last year > (http://savannah.gnu.org/bugs/?63076). It includes some CJK support > but expanding this is an ongoing project > (http://savannah.gnu.org/bugs/?62830). If you have expertise in this > realm and can address some of the outstanding questions in that > ticket, please chime in. I'm not totally ignorant of UTF-8 in particular, but depending on your expectations, I'm possibly also not so hugely competent for the former to be a massively modest understatement. I will say that if anyone following along at home is struggling to get their head around UTF-8, this post by Graham Douglas might be an excellent starting point: <http://www.readytext.co.uk/?p=1284> Thanks and regards, Ian (Ian Ropers)