Re: toLower() and Unicode are incomplete was: Re: avoid toLower in std.algorithm.sort compare alias

Dmitry Olshansky Sun, 22 Apr 2012 01:14:16 -0700

On 22.04.2012 5:43, Ali Çehreli wrote:

On 04/21/2012 04:24 PM, Jay Norwood wrote:
 > While playing with sorting the unzip archive entries I tried use of the
 > last example in http://dlang.org/phobos/std_algorithm.html#sort
 >
 > std.algorithm.sort!("toLower(a.name) <
 > toLower(b.name)",std.algorithm.SwapStrategy.stable)(entries);


Stealing this thread to point out that converting a letter to upper or
lower case cannot be done without knowing the writing system. Phobos's
toLower() documentation currently says: "Returns a string which is
identical to s except that all of its characters are lowercase (in
unicode, not just ASCII)."

Oh, come on. This function wasn't updated for ages. I bet this wordinghere is intact since unicode 4.0 ;)


Unicode cannot define the conversions of at least the following letters
without knowing the actual alphabet that the text is written in:

- Lowercase of I is ı in some alphabets[*] and i in many others.

- Uppercase of i is İ in some alphabets[*] and I in many others.

Fair point. The list however is not that long and a system may choose tosupport this or not (changing behavior based on writing system is calledtailoring I believe).

Ali

[*] Turkish, Azeri, Chrimean Tatar, Gagauz, Celtic, etc.



--
Dmitry Olshansky

Re: toLower() and Unicode are incomplete was: Re: avoid toLower in std.algorithm.sort compare alias

Reply via email to