On Sat, Jul 22, 2023 at 06:46:28PM +0200, Sven Joachim wrote:
> This version of groff maps an unescaped "-" to HYPHEN rather than
> HYPHEN-MINUS.  Due to that, copying text from manpages or following
> references in the "SEE ALSO" section is rather unreliable, because many
> manpages contain a plain "-" where they should have used "\-" instead.
> 
> Neither the upstream NEWS file nor the Debian changelog make any mention
> of this change, or how to revert it locally.  Running "man" under
> LC_ALL=C works around it, at the cost of worse typography.

It is in fact mentioned in the upstream NEWS file:

  o The an (man) and doc (mdoc) macro packages no longer remap the -, ',
    and ` input characters to Basic Latin code points on UTF-8 devices,
    but treat them as groff normally does (and AT&T troff before it did)
    for typesetting devices, where they become the hyphen, apostrophe or
    right single quotation mark, and left single quotation mark,
    respectively.  This change is expected to expose glyph usage errors in
    man pages.  See the "PROBLEMS" file for a recipe that will conceal
    these errors.  A better long-term approach is for man pages to adopt
    correct input practices; the man pages groff_man_style(7),
    groff_char(7), and man-pages(7) (subsection "Generating optimal
    glyphs"; from the Linux man-pages project) contain such instructions.
    Doing so also improves man page typography when formatting for PDF.
  
    If you maintain a generator of man(7) or mdoc(7) documents (such as a
    tool that converts other formats to them), and need assistance, please
    contact the gr...@gnu.org mailing list and describe your situation.

And the PROBLEMS file says:

  * When viewing man pages, some characters on my UTF-8 terminal emulator
    look funny or copy-and-paste wrong.  Why?
  
  Some Unicode Basic Latin ("ASCII") input characters are mapped to
  non-Basic Latin code points in output for consistency with other output
  devices, like PDF.  See groff_man_style(7) and groff_char(7) for correct
  input conventions and background.  If you use the correct groff special
  character escape sequences to input them, you will get correct output no
  matter what device the input is formatted for.
  
  However, many man pages are written in ignorance of the correct special
  characters to obtain the desired glyphs.  You can conceal these errors
  by adding the following to your site-local man(7) configuration.  The
  file is called "man.local"; its installation directory depends on how
  groff was configured when it was built.
  
  --- start ---
  .if '\*[.T]'utf8' \{\
  .  char ' \[aq]
  .  char - \-
  .  char ^ \[ha]
  .  char ` \[ga]
  .  char ~ \[ti]
  .\}
  --- end ---
  
  You may also wish to do the same for "mdoc.local".
  
  In man pages (only), groff maps the minus sign special character '\-' to
  the Basic Latin hyphen-minus (U+002D) because man pages require this
  glyph and there is no historically established *roff input character,
  ordinary or special, for obtaining it when a hyphen and minus sign are
  both separately available.  To obtain a true minus sign, use the special
  character escape sequences '\(mi' or '\[mi]'.

I admit I overlooked this; I was aware of the change, but it somehow
fell off my list of things to make a positive decision about when
packaging 1.23.0.  I'm rather inclined to revert this by adding the rest
of the recipe above to debian/mandoc.local (while I agree with the
idealized typographical point being made, I have approximately negative
appetite for the Sisyphean task of fixing an entire distribution's
manual pages in practice), but I'll let this suggestion sit for a few
days in case anyone wants to make a reasoned argument against it in the
meantime.

-- 
Colin Watson (he/him)                              [cjwat...@debian.org]

Reply via email to