> From: Gavin Smith <[email protected]>
> Date: Mon, 9 Oct 2023 20:39:59 +0100
> Cc: Bruno Haible <[email protected]>, [email protected]
>
> > IOW, unless the locale's codeset is UTF-8, any character that is not
> > printable _in_the_current_locale_ will return -1 from wcwidth. I'm
> > guessing that no one has ever tried to run the test suite in a
> > non-UTF-8 locale before?
>
> It is supposed to attempt to force the locale to a UTF-8 locale. You
> can see the code in xspara_init that attempts to change the locale. There
> is also a comment before xspara_add_text:
>
> "This function relies on there being a UTF-8 locale in LC_CTYPE for
> mbrtowc to work correctly."
You cannot force MS-Windows into using the UTF-8 locale (with the
possible exception of very recent Windows versions, which AFAIK still
don't support UTF-8 in full).
You also cannot force an arbitrary Posix system into using UTF-8,
because such a locale might not be installed.
> For MS-Windows there is the w32_setlocale function that may use something
> different:
>
> /* Switch to the Windows U.S. English locale with its default
> codeset. We will handle the non-ASCII text ourselves, so the
> codeset is unimportant, and Windows doesn't support UTF-8 as the
> codeset anyway. */
> return setlocale (category, "ENU");
>
> mbrtowc has its own override which handle UTF-8.
>
> As far as this relates to wcwidth, there used to be an MS-Windows specific
> stub implementation of this, removed in commit 5a66bc49ac032 (Patrice Dumas,
> 2022-08-19) which added a gnulib implementation of wcwidth:
>
> diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
> index 93924a623c..bf4ef91650 100644
> --- a/tp/Texinfo/XS/xspara.c
> +++ b/tp/Texinfo/XS/xspara.c
> @@ -206,13 +206,6 @@ iswspace (wint_t wc)
> return 0;
> }
>
> -/* FIXME: Provide a real implementation. */
> -int
> -wcwidth (const wchar_t wc)
> -{
> - return wc == 0 ? 0 : 1;
> -}
> -
> int
> iswupper (wint_t wi)
> {
>
>
> If this simple stub is preferable to the Gnulib implementation for
> MS-Windows, (e.g. it makes the tests pass) we could re-add it again.
We can do that, but I think we should first explore a better
alternative: use UTF-8 functions everywhere, without relying on the
locale-aware functions of libc, such as wcwidth. For example, instead
of wcwidth, we could use uc_width.
Is it feasible to use UTF-8 in texi2any disregarding the locale, and
use libunistring or something similar for the few functions we need in
the extensions that are required to deal with non-ASCII characters?
If we can do that, it will work on all systems, including Windows.
(This is basically what Emacs does, but it does that on a much greater
scale, which is unnecessary in texi2any.)