On Wednesday, 26 October 2022 at 06:05:14 UTC, Ali Çehreli wrote:
The problem with Unicode is its main aim of allowing characters of multiple writing systems in the same text. When multiple writing systems are in play, conflicts and ambiguities will appear.

I personally don't think that it's the problem of the Unicode itself. Based on what I can see, it looks like the individuals or the committees responsible for mapping the Turkish alphabet to Unicode just made a blunder.

For example, let's compare the Latin uppercase "B" and the Cyrillic uppercase "В". Looks exactly the same, right? Would it be a smart idea for them to share the same index in the Unicode table? But wait. What happens if we convert these letters to lowercase? The Latin "B" becomes "b" and the Cyrillic "В" becomes "в". Oops! So by having different indexes for the Latin uppercase "B" and the Cyrillic uppercase "В", we dodged a whole bunch of nasty problems.

Another example. Patrick Schluter mentioned the Greek sigma letter and the [wikipedia article](https://en.wikipedia.org/wiki/Sigma) says: "uppercase Σ, lowercase σ, lowercase in word-final position ς", which makes everything rather problematic. Now let's compare this to the Belarusian language and its letter "у". The Belarusian "у" transforms into "ў" depending on context, however this transformation doesn't happen for the first letter of proper nouns or in acronyms (and this theoretically makes the uppercase "ў" redundant). Just imagine an alternative Greek-inspired reality, where both "у" and "ў" uppercase to "У". And yet the uppercase "Ў" exists in Unicode, so luckily in our reality we don't have to deal with uppercase/lowercase round trip failures. This is computers friendly. And as I already mentioned in an earlier comment, the Germans also got the uppercase "ẞ" in Unicode since 2008 (better late than never).

I solved my problem by writing an Alphabet hierarchy in the past. I don't like that code but it still works:

[...]

It's confusing but it seems to work. :) It doesn't matter. Life is imperfect and things will somehow work in the end.

What's your opinion/conclusion? Is it fine the way it is? Do you think that some unique property of the Turkish language/alphabet made these difficulties unavoidable? Or do you think that it was a mistake, but now it has to live with us forever for compatibility reasons? Anything else?

And as for the D language and Phobos, should "ß" still uppercase to "SS"? Or can we change it to uppercase "ẞ" and remove German from the list of tricky languages at https://dlang.org/library/std/uni/to_upper.html ? Should Turkish be listed there?

Reply via email to