Guillem Jover <guil...@debian.org> writes: > On Mon, 2023-08-14 at 14:18:51 +0200, Samuel Thibault wrote:
>> Yes, we'd ideally want to fix all manpages to have everything set >> alright. But we have to do that before the release. And if that's not >> complete, release with the >> >> . char - \- >> >> workaround. > Whenever I've maintained man pages in roff I tend to be precise in > the usage of - and \-, but TBH this has seemed like a lost battle, > more so since at least lintian stopped emitting tags for it. And > another problem which I think it's going to be very hard to fix is > with man page generators from other formats, such as pod2man, where > it currently has heuristics to determine when to use - or \-, but it > does not currently has a way to accurately do this always. Yes, I understand why upstream really wants to find a way to make the distinction between a language hyphen and an ASCII hyphen to work. They are different characters in the *roff language, and in a proper typesetting system such as troff is intended to be, it is important to distinguish between them for the best output. That said, I was surprised to see the attempt to go down this path again given how many problems we had the last time, and I am quite dubious that we will be successful. Not only is this a fiddly point of *roff that a lot of people writing man pages simply don't pay attention to, man pages are also generated from a host of other formats that simply do not have this distinction in their language and therefore *cannot* make this distinction in generated *roff except by guessing. Just to give you an idea of the sort of thing that I'm trying to maintain in order to be "correct" about this distinction, here is the current code from podlators: s{ ( (?:\G|^|\s|$NBSP) [\(\"]* [a-zA-Z] ) ( \\- )? ( (?: [a-zA-Z\']+ \\-)+ ) ( [a-zA-Z\']+ ) (?= [\)\".?!,;:]* (?:\s|$NBSP|\Z|\\\ ) ) \b } { my ($prefix, $hyphen, $main, $suffix) = ($1, $2, $3, $4); $hyphen ||= ''; $main =~ s/\\-/-/g; $prefix . $hyphen . $main . $suffix; }egx; This is still obviously buggy, though. For example, command names mentioned in the text look like words with hyphens and I don't think there's any real way to tell the difference. I have to admit that I am somewhat tempted to at least make this transformation optional and instead let people configure pod2man to simply escape every single - character as \- in the output. This is not "correct", but I think it's more correct than what is happening now, and it's at least consistent. However, I have a note that I have to do this translation or *roff will produce unacceptable output, and I don't remember what problem there was that made me write that comment in the first place. Maybe the problem with breaking long lines with lots-of-words-that-are-all-conncted-by-hyphens, although that's somewhat rare. My opinion is that the world of documents that are handled by man do not encode meaningful distinctions between - and \-, and man should therefore unify those characters. -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>