Re: Replacing tango.text.Ascii.isearch

Siarhei Siamashka via Digitalmars-d-learn Fri, 28 Oct 2022 15:11:06 -0700

On Wednesday, 26 October 2022 at 06:05:14 UTC, Ali Çehreli wrote:

The problem with Unicode is its main aim of allowing charactersof multiple writing systems in the same text. When multiplewriting systems are in play, conflicts and ambiguities willappear.

I personally don't think that it's the problem of the Unicodeitself. Based on what I can see, it looks like the individuals orthe committees responsible for mapping the Turkish alphabet toUnicode just made a blunder.

For example, let's compare the Latin uppercase "B" and theCyrillic uppercase "В". Looks exactly the same, right? Would itbe a smart idea for them to share the same index in the Unicodetable? But wait. What happens if we convert these letters tolowercase? The Latin "B" becomes "b" and the Cyrillic "В" becomes"в". Oops! So by having different indexes for the Latin uppercase"B" and the Cyrillic uppercase "В", we dodged a whole bunch ofnasty problems.

Another example. Patrick Schluter mentioned the Greek sigmaletter and the [wikipediaarticle](https://en.wikipedia.org/wiki/Sigma) says: "uppercase Σ,lowercase σ, lowercase in word-final position ς", which makeseverything rather problematic. Now let's compare this to theBelarusian language and its letter "у". The Belarusian "у"transforms into "ў" depending on context, however thistransformation doesn't happen for the first letter of propernouns or in acronyms (and this theoretically makes the uppercase"ў" redundant). Just imagine an alternative Greek-inspiredreality, where both "у" and "ў" uppercase to "У". And yet theuppercase "Ў" exists in Unicode, so luckily in our reality wedon't have to deal with uppercase/lowercase round trip failures.This is computers friendly. And as I already mentioned in anearlier comment, the Germans also got the uppercase "ẞ" inUnicode since 2008 (better late than never).

I solved my problem by writing an Alphabet hierarchy in thepast. I don't like that code but it still works:
[...]
It's confusing but it seems to work. :) It doesn't matter. Lifeis imperfect and things will somehow work in the end.

What's your opinion/conclusion? Is it fine the way it is? Do youthink that some unique property of the Turkish language/alphabetmade these difficulties unavoidable? Or do you think that it wasa mistake, but now it has to live with us forever forcompatibility reasons? Anything else?

And as for the D language and Phobos, should "ß" still uppercaseto "SS"? Or can we change it to uppercase "ẞ" and remove Germanfrom the list of tricky languages athttps://dlang.org/library/std/uni/to_upper.html ? Should Turkishbe listed there?

Re: Replacing tango.text.Ascii.isearch

Reply via email to