On Thu, Aug 03, 2017 at 02:38:35PM +0000, Joseph Myers wrote: > On Thu, 3 Aug 2017, Jakub Jelinek wrote: > > > In any case, you should probably investigate all the locales say in glibc or > > some other big locale repository whether tolower/toupper have the expected > > properties there. > > They don't. In tr_TR.UTF-8, toupper ('i') == 'i', because 'İ', the > correct uppercase version (as returned in locale tr_TR.ISO-8859-9), is a > multibyte character and toupper can only return single-byte characters.
Indeed, #include <ctype.h> #include <locale.h> int main () { setlocale (LC_ALL, ""); int i; for (i = -1000; i < 1000; i++) if (tolower (i) >= 'A' && tolower (i) <= 'Z') __builtin_abort (); else if (toupper (i) >= 'a' && toupper (i) <= 'z') __builtin_abort (); return 0; } fails for LC_ALL=tr_TR.UTF-8, because tolower ('I') is 'I'. Not to mention that the result is unspecified if the functions are called with a value outside of the range of unsigned char or EOF. Therefore, this optimization is invalid. Jakub