https://issues.dlang.org/show_bug.cgi?id=15440

ag0ae...@gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ag0ae...@gmail.com

--- Comment #1 from ag0ae...@gmail.com ---
Here are three Unicode documents and what they say about the lowercase of
U+0130. (search for "LATIN CAPITAL LETTER I WITH DOT ABOVE"):

1) <http://www.unicode.org/charts/PDF/U0100.pdf> says: "lowercase is 0069 i".

2) <http://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt> gives U+0069
as the lowercase, too, if I read it right.

3) <http://www.unicode.org/Public/UCD/latest/ucdxml/ucd.nounihan.grouped.zip>
gives 'slc="0069" lc="0069 0307"'. I assume "slc" means "simple lowercase", and
"lc" means "lowercase".

So it seems that the "simple lowercase" is 'i', but the proper(?) lowercase is
"\u0069\u0307".

That makes sense when it's supposed to be reversible without assuming a Turkish
context. Uppercasing "\u0069\u0307" you get "\u0049\u0307" ('I' + combining
dot) which is equivalent to "\u0130".

Seems to me that std.uni is playing by the book, and that there's a point in
what the book says. But I don't know enough about Unicode to speak with
certainty.

--

Reply via email to