Eli Zaretskii wrote: > unless the locale's codeset is UTF-8, any character that is not > printable _in_the_current_locale_ will return -1 from wcwidth. I'm > guessing that no one has ever tried to run the test suite in a > non-UTF-8 locale before?
I just tried it now: On Linux (Ubuntu 22.04), in a de_DE.UTF-8 locale, texinfo 7.0.93 build fine and all tests pass. > Yes, quite a few characters return -1 from wcwidth, in particular the > ȷ character above (which explains the above difference). This character is U+0237 LATIN SMALL LETTER DOTLESS J. It *should* be recognized as having a width of 1 in all implementations of wcwidth. There's no reason for it to have a width of -1, since it's not a control character. There's no reason for it to have a width of 0, since it's not a combining mark or a non-spacing character. There's no reason for it to have a width of 2, since it's not a CJK character and not in a Unicode range with many CJK characters. > /* Otherwise, fall back to the system's wcwidth function. */ > #if HAVE_WCWIDTH > return wcwidth (wc); > #else > return wc == 0 ? 0 : iswprint (wc) ? 1 : -1; > #endif > } > } > > > I don't think the above logic in Gnulib's wcwidth (which basically > replicates the logic in any reasonable wcwidth implementation, so is > not specific to Gnulib) fits what Texinfo needs. Texinfo needs to be > able to produce output independently of the locale. What matters to > Texinfo is the encoding of the output document, not the locale's > codeset. So I think we should call uc_width when the output document > encoding is UTF-8 (which is the default, including in the above test), > regardless of the locale's codeset. Or we could use a simpler > approximation: > > return wc == 0 ? 0 : iswcntrl (wc) ? 0 : 1; This "simpler approximation" would not return a good result when wc is a control character (such as CR, LF, TAB, or such). It is important that the caller of wcwidth() or wcswidth() is able to recognize that the string as a whole does not have a definite width. Bruno