On Tuesday, May 31, 2016 16:29:33 Joakim via Digitalmars-d wrote: > UTF-8 is an antiquated hack that needs to be eradicated. It > forces all other languages than English to be twice as long, for > no good reason, have fun with that when you're downloading text > on a 2G connection in the developing world. It is unnecessarily > inefficient, which is precisely why auto-decoding is a problem. > It is only a matter of time till UTF-8 is ditched.
Considering that *nix land uses UTF-8 almost exclusively, and many C libraries do even on Windows, I very much doubt that UTF-8 is going anywhere anytime soon - if ever. The Win32 API does use UTF-16, and Java and C# do, but vast sea of code that is C or C++ generally uses UTF-8 as do plenty of other programming languages. And even aside from English, most European languages are going to be more efficient with UTF-8, because they're still primarily ASCII even if they contain characters that are not. Stuff like Chinese is definitely worse in UTF-8 than it would be in UTF-16, but there are a lot of languages other than English which are going to encode better with UTF-8 than UTF-16 - let alone UTF-32. Regardless, UTF-8 isn't going anywhere anytime soon. _Way_ too much uses it for it to be going anywhere, and most folks have no problem with that. Any attempt to get rid of it would be a huge, uphill battle. But D supports UTF-8, UTF-16, _and_ UTF-32 natively - even without involving the standard library - so anyone who wants to avoid UTF-8 is free to do so. - Jonathan M Davis