> From: Gavin Smith <gavinsmith0...@gmail.com> > Date: Mon, 9 Oct 2023 20:39:59 +0100 > Cc: Bruno Haible <br...@clisp.org>, bug-texinfo@gnu.org > > > IOW, unless the locale's codeset is UTF-8, any character that is not > > printable _in_the_current_locale_ will return -1 from wcwidth. I'm > > guessing that no one has ever tried to run the test suite in a > > non-UTF-8 locale before? > > It is supposed to attempt to force the locale to a UTF-8 locale. You > can see the code in xspara_init that attempts to change the locale. There > is also a comment before xspara_add_text: > > "This function relies on there being a UTF-8 locale in LC_CTYPE for > mbrtowc to work correctly."
You cannot force MS-Windows into using the UTF-8 locale (with the possible exception of very recent Windows versions, which AFAIK still don't support UTF-8 in full). You also cannot force an arbitrary Posix system into using UTF-8, because such a locale might not be installed. > For MS-Windows there is the w32_setlocale function that may use something > different: > > /* Switch to the Windows U.S. English locale with its default > codeset. We will handle the non-ASCII text ourselves, so the > codeset is unimportant, and Windows doesn't support UTF-8 as the > codeset anyway. */ > return setlocale (category, "ENU"); > > mbrtowc has its own override which handle UTF-8. > > As far as this relates to wcwidth, there used to be an MS-Windows specific > stub implementation of this, removed in commit 5a66bc49ac032 (Patrice Dumas, > 2022-08-19) which added a gnulib implementation of wcwidth: > > diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c > index 93924a623c..bf4ef91650 100644 > --- a/tp/Texinfo/XS/xspara.c > +++ b/tp/Texinfo/XS/xspara.c > @@ -206,13 +206,6 @@ iswspace (wint_t wc) > return 0; > } > > -/* FIXME: Provide a real implementation. */ > -int > -wcwidth (const wchar_t wc) > -{ > - return wc == 0 ? 0 : 1; > -} > - > int > iswupper (wint_t wi) > { > > > If this simple stub is preferable to the Gnulib implementation for > MS-Windows, (e.g. it makes the tests pass) we could re-add it again. We can do that, but I think we should first explore a better alternative: use UTF-8 functions everywhere, without relying on the locale-aware functions of libc, such as wcwidth. For example, instead of wcwidth, we could use uc_width. Is it feasible to use UTF-8 in texi2any disregarding the locale, and use libunistring or something similar for the few functions we need in the extensions that are required to deal with non-ASCII characters? If we can do that, it will work on all systems, including Windows. (This is basically what Emacs does, but it does that on a much greater scale, which is unnecessary in texi2any.)