Re: Texinfo 7.0.93 pretest available

Bruno Haible Mon, 09 Oct 2023 09:15:36 -0700

Eli Zaretskii wrote:
> unless the locale's codeset is UTF-8, any character that is not
> printable _in_the_current_locale_ will return -1 from wcwidth.  I'm
> guessing that no one has ever tried to run the test suite in a
> non-UTF-8 locale before?


I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale,
texinfo 7.0.93 build fine and all tests pass.

> Yes, quite a few characters return -1 from wcwidth, in particular the
> ȷ character above (which explains the above difference).

This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be
recognized as having a width of 1 in all implementations of wcwidth.
There's no reason for it to have a width of -1, since it's not a control
character.
There's no reason for it to have a width of 0, since it's not a combining
mark or a non-spacing character.
There's no reason for it to have a width of 2, since it's not a CJK character
and not in a Unicode range with many CJK characters.

>       /* Otherwise, fall back to the system's wcwidth function.  */
> #if HAVE_WCWIDTH
>       return wcwidth (wc);
> #else
>       return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
> #endif
>     }
> }
> 
> 
> I don't think the above logic in Gnulib's wcwidth (which basically
> replicates the logic in any reasonable wcwidth implementation, so is
> not specific to Gnulib) fits what Texinfo needs.  Texinfo needs to be
> able to produce output independently of the locale.  What matters to
> Texinfo is the encoding of the output document, not the locale's
> codeset.  So I think we should call uc_width when the output document
> encoding is UTF-8 (which is the default, including in the above test),
> regardless of the locale's codeset.  Or we could use a simpler
> approximation:
> 
>       return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1;

This "simpler approximation" would not return a good result when wc
is a control character (such as CR, LF, TAB, or such). It is important
that the caller of wcwidth() or wcswidth() is able to recognize that
the string as a whole does not have a definite width.

Bruno

Re: Texinfo 7.0.93 pretest available

Reply via email to