Hi Steffen, At 2022-09-12T15:43:00+0200, Steffen Nurpmeso wrote: > I have problems with the UTF-8 device, it shows > > on‐main‐loop‐tick > instead of > on-main-loop-tock > > ie U+2010 instead of hyphen-minus U+002D. > > The above does not feel right, and searching is impossible! > I would expect U+2010 HYPHEN in hyphenation, but not as a regular > combiner aka delimiter joined words as are used very often in > German, for example.
There are a few points to raise about this. The first is a question. 1. You don't expect a hyphenated word to use a hyphen? 2. This is not a "1.23"-specific issue as your subject lines suggests. $ groff --version | head -n 1 GNU groff version 1.22.4 $ echo 'long-term' | groff -Tutf8 | od -c 0000000 l o n g 342 200 220 t e r m \n \n \n \n \n 0000020 \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n * 0000100 \n \n \n \n \n \n \n \n \n \n \n \n \n 0000115 3. If you're secretly in a man page context but didn't disclose that, then, yes, this is a change from groff 1.22.4. The hyphen-minus, neutral apostrophe, and grave accent no longer map differently for man(7) and mdoc(7) than for any other macro package. (\- still does and there is no prospect of that changing, since there is no *roff special character defined for the "ASCII hyphen-minus", and it is essential to express this precise character in man pages. These issues have been discussed at some length on this mailing list over the past three years.) 4. "on-main-loop-tick" doesn't look a natural language word to me--it looks like an identifier in a programming language (maybe some dialect of Lisp). If that is the case, those hyphens need to be spelled "\-" in the source code. This has always been true in man pages, going back to 1979. Take $ grep '\\-[A-Za-z]' ~/src/unix/v7/usr/man/man1/bc.1 .B \-c .B \-l .B \-l .B \-l .B \-c for example. 5. Searching is not impossible. 5a. Searching for a word that is broken and hyphenated across lines is no more impossible than it always was. On occasions when I have to do this, I break out sed(1) or perl(1). 5b. Literals that might be of interest in man pages should be entered with hyphenation suppressed in the input. The groff man pages in 1.23 do this much more conscientiously than in past releases. This is to avoid confusing users who might wonder if a hyphen is to be interpreted literally or not. 5c. You can disable automatic hyphenation altogether when rendering man pages. See the '-rHY' option in groff_man(7). This feature has been around for many years. 5d. groff's mdoc(7) implementation did not recognize the `HY` register in groff 1.22.4 and earlier. It does now, though. 5e. For me, anyway, searching within less(1) using the pattern with a dot where the hyphen goes works fine, even though there are 3 bytes in the input stream instead of one. Evidently less(1) is smart enough. For instance, I can match "line-ending" in the roff(7) page while paging it with "groff -Tutf8 -man | less -R" by entering "/line.ending" within less(1). I hope this clears some things up. Regards, Branden
signature.asc
Description: PGP signature