Re: 'case' UTF-8 bug

Denys Vlasenko Wed, 05 Jul 2017 09:16:22 -0700

On Wed, Jul 5, 2017 at 1:50 AM, Martijn Dekker <mart...@inlv.org> wrote:
> Op 04-07-17 om 20:23 schreef Denys Vlasenko:
>> Ok. I just tested it again, and it works for me.
>> Let's narrow it down more. My libc is:
>>
>> $ /lib64/libc.so.6
>> GNU C Library (GNU libc) development release version 2.25.90, by
>> Roland McGrath et al.
>> Copyright (C) 2017 Free Software Foundation, Inc.
>>
>> $ rpm -qa | grep glibc
>> glibc-headers-2.25.90-1.fc27.x86_64
>> glibc-static-2.25.90-1.fc27.i686
>> glibc-2.25.90-1.fc27.x86_64
>> ...
>>
>> Yours?
>
> Hmm. I can reproduce this on two of my systems: Slackware 14.1 and
> Slackware 13.37. I can *not* reproduce it on Slackware 14.2 (which I've
> only just got around to trying).
>
> Slackware 14.1 (with multilib added):
>
> $ /lib64/libc.so.6
> GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al.


So, bug triggers with this one.

And Slackware 14.2 has which libc version?


>> Does the bug happen with the current git? (I ask this because this would mean
>> I don't need to go to 1.25.0 every time I want to try reproducing it again).
>
> Yes, I've confirmed that it does still happen with the current git code.

I reproduced it on another machine, with this libc:

$ /lib/libc-2.22.so
GNU C Library (Gentoo 2.22-r4 p13) stable release version 2.22, by
Roland McGrath et al.

The cause: ash uses chars 0x81...0x88 for special purposes.
"π" is encoded as "cf 80" in unicode
"ρ" is encoded as "cf 81" in unicode
ash does have some code which handles 81 et al in user strings. Specifically,
these two one-symbol strings are internally represented differently:

"π" = CTLQUOTEMARK cf 80 CTLQUOTEMARK
"ρ" = CTLQUOTEMARK cf CTLESC 81 CTLQUOTEMARK

CTLESC is meant to prevent 81 to be misinterpreted.

The bug is: when these strings are prepared for fnmatch(),
CTLESC is not removed, but converted to \.
Because it is also used for quoting * and ?, and these _do_ need escaping
as \* and \? for fnmatch() to not interpret them as globbing patterns.

Thus, ash ends up calling fnmatch('cf \ 81', 'cf 81', 0).
This normally works - superfluous backslash-escapes are simply ignored,
and this returns a match.

I guess what happens is that in unicode locale, some versions of glibc
do not allow backslash-escape _inside_ a unicode character.
It probably freaks out seeing invalid unicode sequence.
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: 'case' UTF-8 bug

Reply via email to