Re: multibyte characters in the Info reader

Eli Zaretskii Fri, 16 Jan 2026 23:32:55 -0800

> From: Gavin Smith <[email protected]>
> Date: Fri, 16 Jan 2026 20:47:02 +0000
> Cc: [email protected]
> 
> It seems like it would be simple to add code to pass through non-ASCII
> bytes to the terminal:
> 
> diff --git a/info/display.c b/info/display.c
> index 4df6a45063..34deae02ef 100644
> --- a/info/display.c
> +++ b/info/display.c
> @@ -501,7 +501,7 @@ printed_representation (mbi_iterator_t *iter, int *delim, 
> size_t pl_chars,
>  
>    text_buffer_reset (&printed_rep);
>  
> -  if (mb_isprint (mbi_cur (*iter)))
> +  if (0 && mb_isprint (mbi_cur (*iter)))
>      {
>        /* cur.wc gives a wchar_t object.  See mbiter.h in the
>           gnulib/lib directory. */
> @@ -575,6 +575,35 @@ printed_representation (mbi_iterator_t *iter, int 
> *delim, size_t pl_chars,
>      }
>    else
>      {
> +      if (1)
> +        {
> +          unsigned char c = *cur_ptr;
> +          if ((c & 0x80) == 0x00)
> +            {
> +              /* ASCII */
> +              *pchars = 1;
> +              *pbytes = 1;
> +              ITER_SETBYTES (*iter, 1);
> +              return cur_ptr;
> +            }
> +          if ((c & 0xc0) == 0x80)
> +            {
> +              /* UTF-8 continuation byte. */
> +              *pchars = 0;
> +              *pbytes = 1;
> +              ITER_SETBYTES (*iter, 1);
> +              return cur_ptr;
> +            }
> +          if ((c & 0xc0) == 0xc0)
> +            {
> +              /* UTF-8 initial byte. */
> +              *pchars = 1;
> +              *pbytes = 1;
> +              ITER_SETBYTES (*iter, 1);
> +              return cur_ptr;
> +            }
> +        }
> +
>        /* Original byte was not recognized as anything.  Display its octal
>           value.  This could happen in the C locale for bytes above 128,
>           or for bytes 128-159 in an ISO-8859-1 locale.  Don't output the 
> bytes
> 
> 
> This counts the screen width of all Unicode codepoints as 1 column,
> which will nearly always be correct.  It should make UTF-8 files display
> mostly properly in the MS-Windows terminal that you are using.


If the above doesn't produce any problems (except with occasional wide
characters), then it's an easy solution, I think.  And we could even
do better if in the "UTF-8 initial byte" clause we compute the Unicode
codepoint of the character and call wcwidth (which on Windows will
call the Gnulib wcwidth and on other systems will DTRT since the above
code should only be used when the locale's codeset is UTF-8).

> We could add an Info variable to customize this behaviour.

That'd be great, thanks.  Would it be possible to add that in this
release?

Re: multibyte characters in the Info reader

Reply via email to