On 05/12/2016 06:29 PM, Andrei Alexandrescu wrote:
I'd be curious of a crisp list of grievances about
autodecoding. -- Andrei

It emits code points (dchar) which is an awkward middle point between code units (char/wchar) and graphemes.

Without any auto-decoding at all, every array T[] would be a random-access range of Ts as well. `.front` would be the same as `[0]`, `.length` would be the same as `.walkLength`, etc. That would make things less confusing for newbies, and more experienced programmers wouldn't accidentally mix the two abstraction levels.

Of course, you'd have to be aware that a (w)char is not a character as perceived by humans, but a code unit. But auto-decoding to code points only shifts that problem: You have to be aware that a dchar is not a character either. Multiple dchars may form one visible character, one grapheme. For example, "\u00E4" and "a\u0308" encode the same grapheme: "รค".

If char[], wchar[], dchar[] (and qualified variants) were ranges of graphemes, things would make the most sense for people who are not aware of delicate details of Unicode. You wouldn't accidentally cut code points or graphemes in half, `.walkLength` makes intuitive sense, etc. You could still accidentally use `.length` or `[0]`, though. So it still has some pitfalls.

Reply via email to