On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright wrote:
On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states:

"UTF character support is restricted to '\u0000' <= character <= '\U0010FFFF'."

Seems like the natural fit for me. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. Not the biggest issue in the world and maybe I'm just being overly critical here.

Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs.

https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32.

I recommend using UTF8.

Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some starting off with the earlier UCS-2:

https://en.m.wikipedia.org/wiki/UTF-16#Usage

Not saying either is better, each has their flaws, just pointing out it's more than just Windows.

Reply via email to