On 01/28/2014 11:48 PM, Paul Eggert wrote:

>  
> +/* The following functions exploit the commutativity and associativity of ^,
> +   and the fact that X ^ X is zero.  POSIX requires that C equals
> +   either tolower (C) or toupper (C);

Unfortunately, while this is true, I'm not sure if it accurately covers
all possible case-folded comparisons outside of the C locale.

http://www.unicode.org/faq/casemap_charprop.html

Consider the Greek locale, el_GR.UTF-8, which has two lower-case sigma:
L'\x3c3' and L'\x3c2', but only one upper-case: L'\x3a3'.  As a result,
all three wchar_t values must compare case-insensitively to one another.

Or consider titlecase characters, such as Unicode L'\x1c8' (Lj), which
has both an uppercase mapping L'\x1c7' (LJ) and lowercase mapping
L'\x1c9' (lj) - again, all three wchar_t values must compare
case-insensitively to one another.

Your hack is great at finding characters that have a case mapping, but
not necessarily at finding all such characters that map to the same
result when passed through towlower(towupper(c)).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to