On Wednesday 11 February 2015 18:26:40 Guido Seifert wrote: > > > Yes, and he already said such example, ß becomes SS > > > > The other example that was given is 'i' (UTF-8 0x69) becoming 'İ' under a > > Turkish locale (UTF-8 0xc4 0xb0). > > Ah sorry. I was too focused on the visible length. 'i' = 'İ' = 1. But of > course I have to look at the memory usage in the string. Btw... what would > happen in Mark's example?
Which example? Using the std::transform with ::toupper? Well, that depends on what toupper does and whether you configured the global C library locale correctly. -- Function: int toupper (int C) Preliminary: | MT-Safe | AS-Safe | AC-Safe | *Note POSIX Safety Concepts::. If C is a lower-case letter, `toupper' returns the corresponding upper-case letter. Otherwise C is returned unchanged. 1) toupper('i') == 'i' "istanbul" → "iSTANBUL" 2) toupper('i') == L'İ' == 0x130 "istanbul" → "0STANBUL" 3) toupper('i') == 'I' "istanbul" → "ISTANBUL" All solutions are wrong for Turkish. By the way, QByteArray's toUpper and toLower are now documented to operate *exclusively* on Latin 1 and no locale variants apply, so i becomes I and ß/ÿ remain ß/ÿ. There used to be a bug in this until 5.4.0 [1]. Also, QString does not support locale-based case conversions, so "istanbul" always becomes "ISTANBUL" -- locale-based conversions should be in QLocale, but the feature is missing. At least "fußball" becomes "FUSSBALL" and ÿ gets properly uppercased. [1] eef74f82db049517aa5a80e7c9456c4cbda953d1: [...] Also as a consequence, this changes the handling of two characters in Latin 1: 'ß' should be uppercased to "SS" but we won't do it, and 'ÿ' can't be uppercased in Latin 1 ('Ÿ' is outside the range). [...] -- Thiago Macieira - thiago.macieira (AT) intel.com Software Architect - Intel Open Source Technology Center _______________________________________________ Development mailing list Development@qt-project.org http://lists.qt-project.org/mailman/listinfo/development