Bug#1041731: groff-base: "-" mapped as HYPHEN

2023-09-27 Thread Vincent Lefevre
On 2023-09-11 20:33:27 -0700, Russ Allbery wrote:
> Yes, I understand why upstream really wants to find a way to make the
> distinction between a language hyphen and an ASCII hyphen to work.  They
> are different characters in the *roff language, and in a proper
> typesetting system such as troff is intended to be, it is important to
> distinguish between them for the best output.

However, the apostrophe and the right single quotation mark are also
(semantically) different characters, but now the apostrophe is output
as the same character as the right single quotation mark!

More importantly, this change also makes searching for words with
the concerned characters (apostrophe and hyphen) more difficult
(or unnatural) in "less".

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#1041731: groff-base: "-" mapped as HYPHEN

2023-09-11 Thread Russ Allbery
Guillem Jover  writes:
> On Mon, 2023-08-14 at 14:18:51 +0200, Samuel Thibault wrote:

>> Yes, we'd ideally want to fix all manpages to have everything set
>> alright. But we have to do that before the release. And if that's not
>> complete, release with the
>> 
>> .char - \-
>> 
>> workaround.

> Whenever I've maintained man pages in roff I tend to be precise in
> the usage of - and \-, but TBH this has seemed like a lost battle,
> more so since at least lintian stopped emitting tags for it. And
> another problem which I think it's going to be very hard to fix is
> with man page generators from other formats, such as pod2man, where
> it currently has heuristics to determine when to use - or \-, but it
> does not currently has a way to accurately do this always.

Yes, I understand why upstream really wants to find a way to make the
distinction between a language hyphen and an ASCII hyphen to work.  They
are different characters in the *roff language, and in a proper
typesetting system such as troff is intended to be, it is important to
distinguish between them for the best output.

That said, I was surprised to see the attempt to go down this path again
given how many problems we had the last time, and I am quite dubious that
we will be successful.  Not only is this a fiddly point of *roff that a
lot of people writing man pages simply don't pay attention to, man pages
are also generated from a host of other formats that simply do not have
this distinction in their language and therefore *cannot* make this
distinction in generated *roff except by guessing.

Just to give you an idea of the sort of thing that I'm trying to maintain
in order to be "correct" about this distinction, here is the current code
from podlators:

s{
( (?:\G|^|\s|$NBSP) [\(\"]* [a-zA-Z] ) ( \\- )?
( (?: [a-zA-Z\']+ \\-)+ )
( [a-zA-Z\']+ ) (?= [\)\".?!,;:]* (?:\s|$NBSP|\Z|\\\ ) )
\b
} {
my ($prefix, $hyphen, $main, $suffix) = ($1, $2, $3, $4);
$hyphen ||= '';
$main =~ s/\\-/-/g;
$prefix . $hyphen . $main . $suffix;
}egx;

This is still obviously buggy, though.  For example, command names
mentioned in the text look like words with hyphens and I don't think
there's any real way to tell the difference.

I have to admit that I am somewhat tempted to at least make this
transformation optional and instead let people configure pod2man to simply
escape every single - character as \- in the output.  This is not
"correct", but I think it's more correct than what is happening now, and
it's at least consistent.  However, I have a note that I have to do this
translation or *roff will produce unacceptable output, and I don't
remember what problem there was that made me write that comment in the
first place.  Maybe the problem with breaking long lines with
lots-of-words-that-are-all-conncted-by-hyphens, although that's somewhat
rare.

My opinion is that the world of documents that are handled by man do not
encode meaningful distinctions between - and \-, and man should therefore
unify those characters.

-- 
Russ Allbery (r...@debian.org)  



Bug#1041731: groff-base: "-" mapped as HYPHEN

2023-09-11 Thread Guillem Jover
Hi!

[ CCed Russ for the pod2man side of this. ]

On Mon, 2023-08-14 at 14:18:51 +0200, Samuel Thibault wrote:
> I'm marking this important, and am tempted to raise it to serious...
> 
> The problem at stake is that we have already a hard time making
> newcomers read manpages. If they can't even trust copying/pasting lines
> from them, they will just definitely turn away, and we'll aggravate the
> schism between us olders and newcomers. Trust me from 20-year teaching
> experience...

This is not just copy&pasting, searching in formatted man pages from
within a pager or with grep for example does not work any more (well
you can always use «.» but that's rather unintuitive).

> Yes, we'd ideally want to fix all manpages to have everything set
> alright. But we have to do that before the release. And if that's not
> complete, release with the
> 
> .char - \-
> 
> workaround.

Whenever I've maintained man pages in roff I tend to be precise in
the usage of - and \-, but TBH this has seemed like a lost battle,
more so since at least lintian stopped emitting tags for it. And
another problem which I think it's going to be very hard to fix is
with man page generators from other formats, such as pod2man, where
it currently has heuristics to determine when to use - or \-, but it
does not currently has a way to accurately do this always.

> As in: maybe we can leave the symptom open until the freeze period, so
> that developers notice the issue and fix their bugs, and on the freeze
> period, introduce the workaround so that end users of the eventual
> released distribution don't get affected while we are still fixing the
> bugs.

While in an ideal world that might be good, I'm not sure this is worth
the pain, and fixing this (if deemed necessary) out of linting tags
seems like a better plan?

Thanks,
Guillem



Processed: Re: Bug#1041731: groff-base: "-" mapped as HYPHEN

2023-08-14 Thread Debian Bug Tracking System
Processing control commands:

> severity -1 serious
Bug #1041731 [groff-base] groff-base: "-" mapped as HYPHEN
Bug #1043196 [groff-base] tmux: Manpages are broken because of unnecessary utf8 
characters
Severity set to 'serious' from 'important'
Severity set to 'serious' from 'important'

-- 
1041731: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1041731
1043196: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1043196
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems