On 09/06/2018 12:40 PM, Chris wrote:
To avoid this you have to normalize and recompose any decomposed characters. I remember that Mac OS X used (and still uses?) decomposed characters by default, so when you typed 'รก' into your cli, it would automatically decompose it to 'a' + acute. `string` however returns len=2 for composed characters too. If you do a lot of string handling it will come back to bite you sooner or later.

You say that D users shouldn't need a '"Unicode license" before they do anything with strings'. And you say that Python 3 gets it right (or maybe less wrong than D).

But here we see that Python requires a similar amount of Unicode knowledge. Without your Unicode license, you couldn't make sense of `len` giving different results for two strings that look the same.

So both D and Python require a Unicode license. But on top of that, D also requires an auto-decoding license. You need to know that `string` is both a range of code points and an array of code units. And you need to know that `.length` belongs to the array side, not the range side. Once you know that (and more), things start making sense in D.

My point is: D doesn't require more Unicode knowledge than Python. But D's auto-decoding gives `string` a dual nature, and that can certainly be confusing. It's part of why everybody dislikes auto-decoding.

(Not saying that Python is free from such pitfalls. I simply don't know the language well enough.)

Reply via email to