Re: First Impressions!

Patrick Schluter via Digitalmars-d Thu, 30 Nov 2017 22:11:53 -0800

On Thursday, 30 November 2017 at 19:37:47 UTC, StevenSchveighoffer wrote:

On 11/30/17 1:20 PM, Patrick Schluter wrote:
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan MDavis wrote:
English and thus don't as easily hit the cases where theircode is wrong. For better or worse, UTF-16 hides it betterthan UTF-8, but the problem exists in both.
To give just an example of what can go wrong with UTF-16.Reading a file in UTF-16 and converting it tosomething elselike UTF-8 or UTF-32. Reading block by block and hittingexactly a SMP codepoint at the buffer limit, high surrogate atthe end of the first buffer, low surrogate at the start of thenext. If you don't think about it => 2 invalid charactersinstead of your nice poop 💩 emoji character (emojis are in theSMP and they are more and more frequent).
iopipe handles this:http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

It was only to give an example. With UTF-8 people who implementthe low level code in general think about the multiple codeunitsat the buffer boundary. With UTF-16 it's often forgotten. InUTF-16 there are also 2 other common pitfalls, that exist also inUTF-8 but are less consciously acknowledged, overlong encodingand isolated codepoints. So UTF-16 has the same issues as UTF-8,plus some more, endianness and size.

Re: First Impressions!

Reply via email to