Re: [PATCH] rl_change_case: skip over invalid mbchars

Grisha Levit Thu, 23 May 2024 17:57:07 -0700

On Thu, May 23, 2024 at 4:11 PM Chet Ramey <chet.ra...@case.edu> wrote:
>
> On 5/23/24 3:25 PM, Grisha Levit wrote:
> > On Thu, May 23, 2024 at 10:25 AM Chet Ramey <chet.ra...@case.edu> wrote:
> >>
> >> On 5/21/24 2:42 PM, Grisha Levit wrote:
> >>> Avoid using (size_t)-1 as an offset.
> >>
> >> I can't reproduce this on macOS. Where is the code that's using -1 as an
> >> offset?
> >
> > The loop in rl_change_case does the following:
> >
> >      rl_change_case(count=-1, op=2) at text.c:1483:9
> >         1481   while (start < end)
> >         1482     {
> >      -> 1483       c = _rl_char_value (rl_line_buffer, start);
> >
> >      _rl_char_value(buf="\xc0", ind=0) at mbutil.c:493:23
> >         491    l = strlen (buf);
> >         492    if (ind + 1 >= l)
> >      -> 493      return ((WCHAR_T) buf[ind]);
> >
> >      (wchar_t) c = L'À'
>
> Nope, this is where you lose me. Using lldb with an input file created
> from the string you sent, I get c = (wchar_t) L'\U0000fffd', which fails
> the rl_walphabetic test. Even running the command as you posted it just
> prints `?'. What os are you using?


I think this is lldb being too clever and showing _any_ negative wchar_t
as the unicode replacement character.

(lldb) p (wchar_t)-1
(wchar_t) L'\U0000fffd'
(lldb) p (wchar_t)-64
(wchar_t) L'\U0000fffd'

The issue here is that on arm64 linux, char is unsigned, so the (wchar_t)
conversion of a plain char in the '\x80'-'\xFF' range yields a valid wide
character.

(lldb) p (wchar_t)(  signed char)'\xC0' == L'\u00C0'
(bool) false
(lldb) p (wchar_t)(unsigned char)'\xC0' == L'\u00C0'
(bool) true

Re: [PATCH] rl_change_case: skip over invalid mbchars

Reply via email to