On Wed, Jan 21, 2026 at 07:56:50PM +0000, Gavin Smith wrote:
> On Sat, Jan 17, 2026 at 09:28:30AM +0200, Eli Zaretskii wrote:
> > If the above doesn't produce any problems (except with occasional wide
> > characters), then it's an easy solution, I think. And we could even
> > do better if in the "UTF-8 initial byte" clause we compute the Unicode
> > codepoint of the character and call wcwidth (which on Windows will
> > call the Gnulib wcwidth and on other systems will DTRT since the above
> > code should only be used when the locale's codeset is UTF-8).
> >
> > > We could add an Info variable to customize this behaviour.
> >
> > That'd be great, thanks. Would it be possible to add that in this
> > release?
>
> Here's a more finished patch. It would be fine to include this in the
> next release if you can confirm that it works acceptably.
It seems to me that the column number of multibyte characters, typically
for ideograms is not taken into account. I do not know if it is on
purpose, nor if it is easy to do with the current code, but in case it
could be useful, here is how it is done in texi2any with the help of
libunistring, for a string already UTF-8 encoded:
uint8_t *u8_text = (uint8_t *) text;
int width = u8_strwidth (u8_text, "UTF-8");
u8_width could also be used after the number of bytes for an UTF-8
character have been collected.
> wcwidth takes a wchar_t argument and we can't guarantee the format of
> this type. Moreover, in info/pcterm.c, we redefine wcwidth as there
> was a performance issue with calling the gnulib definition. Reading
> the UTF-8 sequence, obtaining the codepoint and calling wcwidth seems
> to me to be a unnecessary complication for a marginal use case.
>
>
> diff --git a/info/display.c b/info/display.c
> index 4df6a45063..6c71bd9799 100644
> --- a/info/display.c
> +++ b/info/display.c
> @@ -482,6 +482,8 @@ display_process_line (WINDOW *win,
>
> static struct text_buffer printed_rep = { 0 };
>
> +int raw_utf8_output_p = 0;
> +
> /* Return pointer to string that is the printed representation of character
> (or other logical unit) at ITER if it were printed at screen column
> PL_CHARS. Use ITER_SETBYTES (util.h) on ITER if we need to advance
> @@ -501,7 +503,38 @@ printed_representation (mbi_iterator_t *iter, int
> *delim, size_t pl_chars,
>
> text_buffer_reset (&printed_rep);
>
> - if (mb_isprint (mbi_cur (*iter)))
> + if (raw_utf8_output_p && (unsigned char) *cur_ptr >= 0x80)
> + {
> + /* For systems without a working UTF-8 locale but where UTF-8
> + actually works on the terminal. This may happen in an MS-Windows
> + UTF-8 terminal with the MSVCRT run-time.
> +
> + Pass through UTF-8 bytes to the terminal. Count each character as
> + a single screen column. This at least allows viewing (mostly
> + correctly) non-ASCII characters in UTF-8 Info files.
> +
> + Searching, user entry etc. of non-ASCII characters may still
> + not work correctly. */
> +
> + unsigned char c = *cur_ptr;
> + if ((c & 0xc0) == 0xc0)
> + {
> + /* UTF-8 initial byte. */
> + *pchars = 1;
> + *pbytes = 1;
> + ITER_SETBYTES (*iter, 1);
> + return cur_ptr;
> + }
> + if ((c & 0xc0) == 0x80)
> + {
> + /* UTF-8 continuation byte. */
> + *pchars = 0;
> + *pbytes = 1;
> + ITER_SETBYTES (*iter, 1);
> + return cur_ptr;
> + }
> + }
> + else if (mb_isprint (mbi_cur (*iter)))
> {
> /* cur.wc gives a wchar_t object. See mbiter.h in the
> gnulib/lib directory. */
> diff --git a/info/variables.c b/info/variables.c
> index b6d4371de7..e91869ff57 100644
> --- a/info/variables.c
> +++ b/info/variables.c
> @@ -164,6 +164,10 @@ VARIABLE_ALIST info_variables[] = {
> N_("How to print the information line at the start of a node"),
> CHOICES_VAR(nodeline_print, nodeline_choices) },
>
> + { "raw-utf8-output",
> + N_("Always pass through non-ASCII UTF-8 bytes in files to terminal"),
> + ON_OFF_VAR(raw_utf8_output_p) },
> +
> { NULL }
> };
>
> diff --git a/info/variables.h b/info/variables.h
> index 5454ab942e..03d263c6a2 100644
> --- a/info/variables.h
> +++ b/info/variables.h
> @@ -79,6 +79,7 @@ extern int key_time;
> extern int mouse_protocol;
> extern int follow_strategy;
> extern int nodeline_print;
> +extern int raw_utf8_output_p;
>
> typedef struct {
> unsigned long mask;
>
>
>
>
>
>