On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright
wrote:
On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type
should have probably been utf16 by default. Especially
considering the utf module states:
"UTF character support is restricted to '\u0000' <= character
<= '\U0010FFFF'."
Seems like the natural fit for me. Plus for the vast majority
of use cases I am pretty guaranteed a char = codepoint. Not
the biggest issue in the world and maybe I'm just being overly
critical here.
Sooner or later your code will exhibit bugs if it assumes that
char==codepoint with UTF16, because of surrogate pairs.
https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java
As far as I can tell, pretty much the only users of UTF16 are
Windows programs. Everyone else uses UTF8 or UCS32.
I recommend using UTF8.
Java, .NET, Qt, Javascript, and a handful of others use UTF-16
too, some starting off with the earlier UCS-2:
https://en.m.wikipedia.org/wiki/UTF-16#Usage
Not saying either is better, each has their flaws, just pointing
out it's more than just Windows.