On 12/23/2013 04:12 PM, Jim Meyering wrote: > Did you miss the "isascii" check in the new trivial_case_convert function?
No. But even with that check in place:
> If you can describe circumstances in which the new patch malfunctions,
> please do,
> but everything you wrote seems to rely on a false assumption.
No, it's a quite real complaint - your patch is broken for tr_TR.
> E.g., your turkish-I example works fine with my patch.
isascii('i') is true, but converting 'i' to '[iI]' is incorrect in the
tr_TR locale. Rather, the conversion must be to '[iİ]'; similarly, 'I'
would be translated to '[Iı]'. Neither of those conversions fit in 4
bytes (since dotted-capital-I and dotless-lower-i are both multi-byte
characters).
Need help easily finding those characters on a non-Turkish keyboard? I
used:
$ echo iI | LC_ALL=tr_TR.UTF-8 sed 's/\(.\)\(.\)/\U\1\L\2/'
At any rate, prior to your patch, lower dotless i in the buffer gives an
insensitive match to upper dotless I in the pattern:
$ echo ı | LC_ALL=tr_TR.UTF-8 grep -i I || echo no match
ı
After your patch:
$ echo ı | LC_ALL=tr_TR.UTF-8 src/grep -i I || echo no match
no match
Oops, you failed to match lower dotless i insensitively against upper
dotless I, because upper dotless I is ascii, but you incorrectly
converted it into the wrong pattern.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
