On Thu, May 23, 2024 at 4:11 PM Chet Ramey <chet.ra...@case.edu> wrote: > > On 5/23/24 3:25 PM, Grisha Levit wrote: > > On Thu, May 23, 2024 at 10:25 AM Chet Ramey <chet.ra...@case.edu> wrote: > >> > >> On 5/21/24 2:42 PM, Grisha Levit wrote: > >>> Avoid using (size_t)-1 as an offset. > >> > >> I can't reproduce this on macOS. Where is the code that's using -1 as an > >> offset? > > > > The loop in rl_change_case does the following: > > > > rl_change_case(count=-1, op=2) at text.c:1483:9 > > 1481 while (start < end) > > 1482 { > > -> 1483 c = _rl_char_value (rl_line_buffer, start); > > > > _rl_char_value(buf="\xc0", ind=0) at mbutil.c:493:23 > > 491 l = strlen (buf); > > 492 if (ind + 1 >= l) > > -> 493 return ((WCHAR_T) buf[ind]); > > > > (wchar_t) c = L'À' > > Nope, this is where you lose me. Using lldb with an input file created > from the string you sent, I get c = (wchar_t) L'\U0000fffd', which fails > the rl_walphabetic test. Even running the command as you posted it just > prints `?'. What os are you using?
I think this is lldb being too clever and showing _any_ negative wchar_t as the unicode replacement character. (lldb) p (wchar_t)-1 (wchar_t) L'\U0000fffd' (lldb) p (wchar_t)-64 (wchar_t) L'\U0000fffd' The issue here is that on arm64 linux, char is unsigned, so the (wchar_t) conversion of a plain char in the '\x80'-'\xFF' range yields a valid wide character. (lldb) p (wchar_t)( signed char)'\xC0' == L'\u00C0' (bool) false (lldb) p (wchar_t)(unsigned char)'\xC0' == L'\u00C0' (bool) true