On Friday, 13 May 2016 at 21:46:28 UTC, Jonathan M Davis wrote:
The history of why UTF-16 was chosen isn't really relevant to
my point (Win32 has the same problem as Java and for similar
reasons).
My point was that if you use UTF-8, then it's obvious _really_
fast when you screwed up Unicode-handling by treating a code
unit as a character, because anything beyond ASCII is going to
fall flat on its face.
On the other hand if you deal with UTF-16 text, you can't
interpret it in a way other than UTF-16, people either get it
correct or give up, even for ASCII, even with casts, it's that
resilient. With UTF-8 problems happened on a massive scale in
LAMP setups: mysql used latin1 as a default encoding and almost
everything worked fine.