On 5/11/17 8:56 AM, Eduardo Bustamante wrote:
> The C with acute accent character: https://en.wikipedia.org/wiki/%C4%86
>
> - Upper case
> dualbus@debian:~$ printf '\U0106\n'
> Ć
>
> - Lower case
> dualbus@debian:~$ printf '\U0107\n'
> ć
>
> Now, in bash, if you type in ć, then run readline `upcase-word' on it,
> instead of ending up with the UTF-8 multibyte string for U+0106 (0xC4
> 0x86), you end up with 0x07 0x87.
>
> The parameter expansion doesn't seem to have that problem so I think
> it's a bug in readline:
Thanks for the report. This is a bug in readline.
> For some reason, rl_change_case thinks `c` is ASCII:
>
> (gdb) call isascii((unsigned char)c)
> $8 = 1
Because when you cast it to unsigned char, it masks all but the least
significant 8 bits, which results in a valid ascii character.
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU [email protected] http://cnswww.cns.cwru.edu/~chet/