On Wed, Sep 11, 2013 at 12:40 PM, Anne van Kesteren <[email protected]>wrote:
> On Tue, Sep 10, 2013 at 8:14 AM, Mathias Bynens <[email protected]> wrote: > > FWIW, here’s a real-world example of a case where this behavior is > annoying/unexpected to developers: http://cirw.in/blog/node-unicode > > That seems like a serious bug in V8 though. A utf-8 encoder should > never ever generate CESU-8 byte sequences. > Just to be clear, V8 does not generate CESU-8 if you give it well formed UTF-16. If you give it broken UTF-16 with unpaired surrogates you can either break the data or emit CESU-8. In the first case, you overwrite the unpaired surrogates with some sort of error character code. In the second case you can generate three-byte UTF-8 sequences that are not strictly legal. The second option will preserve the data if you round-trip it into V8 again (or feed it to other apps that are liberal in what they accept), so that's what V8 currently does. -- Erik Corry >
_______________________________________________ es-discuss mailing list [email protected] https://mail.mozilla.org/listinfo/es-discuss

