Guillem Jover <guil...@debian.org> writes:
> On Mon, 2023-08-14 at 14:18:51 +0200, Samuel Thibault wrote:

>> Yes, we'd ideally want to fix all manpages to have everything set
>> alright. But we have to do that before the release. And if that's not
>> complete, release with the
>> 
>> .    char - \-
>> 
>> workaround.

> Whenever I've maintained man pages in roff I tend to be precise in
> the usage of - and \-, but TBH this has seemed like a lost battle,
> more so since at least lintian stopped emitting tags for it. And
> another problem which I think it's going to be very hard to fix is
> with man page generators from other formats, such as pod2man, where
> it currently has heuristics to determine when to use - or \-, but it
> does not currently has a way to accurately do this always.

Yes, I understand why upstream really wants to find a way to make the
distinction between a language hyphen and an ASCII hyphen to work.  They
are different characters in the *roff language, and in a proper
typesetting system such as troff is intended to be, it is important to
distinguish between them for the best output.

That said, I was surprised to see the attempt to go down this path again
given how many problems we had the last time, and I am quite dubious that
we will be successful.  Not only is this a fiddly point of *roff that a
lot of people writing man pages simply don't pay attention to, man pages
are also generated from a host of other formats that simply do not have
this distinction in their language and therefore *cannot* make this
distinction in generated *roff except by guessing.

Just to give you an idea of the sort of thing that I'm trying to maintain
in order to be "correct" about this distinction, here is the current code
from podlators:

    s{
        ( (?:\G|^|\s|$NBSP) [\(\"]* [a-zA-Z] ) ( \\- )?
        ( (?: [a-zA-Z\']+ \\-)+ )
        ( [a-zA-Z\']+ ) (?= [\)\".?!,;:]* (?:\s|$NBSP|\Z|\\\ ) )
        \b
    } {
        my ($prefix, $hyphen, $main, $suffix) = ($1, $2, $3, $4);
        $hyphen ||= '';
        $main =~ s/\\-/-/g;
        $prefix . $hyphen . $main . $suffix;
    }egx;

This is still obviously buggy, though.  For example, command names
mentioned in the text look like words with hyphens and I don't think
there's any real way to tell the difference.

I have to admit that I am somewhat tempted to at least make this
transformation optional and instead let people configure pod2man to simply
escape every single - character as \- in the output.  This is not
"correct", but I think it's more correct than what is happening now, and
it's at least consistent.  However, I have a note that I have to do this
translation or *roff will produce unacceptable output, and I don't
remember what problem there was that made me write that comment in the
first place.  Maybe the problem with breaking long lines with
lots-of-words-that-are-all-conncted-by-hyphens, although that's somewhat
rare.

My opinion is that the world of documents that are handled by man do not
encode meaningful distinctions between - and \-, and man should therefore
unify those characters.

-- 
Russ Allbery (r...@debian.org)              <https://www.eyrie.org/~eagle/>

Reply via email to