Re: New full Unicode for ES6 idea

Brendan Eich Sun, 19 Feb 2012 14:28:57 -0800

Phillips, Addison wrote:

Why would converting the existing UCS-2 support to be UTF-16 not be a good idea? There is nothing 
intrinsically wrong that I can see with that approach and it would be the most compatible with 
existing scripts, with no special "modes", "flags", or interactions.

Allen proposed this, essentially (some confusion surrounded thediscussion by mixing observable-in-language withencoding/format/serialization issues, leading to talk of 32-bitcharacters), last year. As I wrote in the o.p., this led to twoobjections: big implementation hit; incompatible change.

I tackled the second with the BRS and (in detail) mediation across DOMwindow boundaries. This I believe takes the sting out of the first(lesser implementation change in light of existing mediation at thoseboundaries).

Yes, the complexity of supplementary characters (i.e. non-BMP characters) 
represented as surrogate pairs must still be dealt with.

I'm not sure what you mean. JS today allows (ignoring invalid pairs)such surrogates but they count as two indexes and add two to length, notone. That is the first problem to fix (ignoring literal escape-notationexpressiveness).

  It would also expose the possibility of invalid strings (with unpaired 
surrogates).


That problem exists today.

  But this would not be unlike other programming languages--or even ES as it 
exists today.

Right! We should do better. As I noted, Node.js heavy hitters (mranneyof Voxer) testify that they want full Unicode, not what's specifiedtoday with indexing and length-accounting by uint16 storage units.

  The purity of a "Unicode string" would be watered down, but perhaps not 
fatally. The Java language went through this (yeah, I know, I know...) and seems to have 
emerged unscathed.

Java's dead on the client. It is used by botnets (bugzilla.mozilla.orgrecently suffered a DDOS from one, the bad guys didn't even botherchanging the user-agent from the default one for the Java runtime). SeeBrian Krebs' blog.

  Norbert has a lovely doc here about the choices that lead to this, which 
seems useful to consider: [1]. W3C I18N Core WG has a wiki page shared with 
TC39 awhile ago here: [2].

To me, switching to UTF-16 seems like a relatively small, containable, 
non-destructive change to allow supplementary character support.

I still don't know what you mean. How would what you call "switching toUTF-16" differ from today, where one can inject surrogates into literalsby transcoding from an HTML document or .js file CSE?

In particular, what do string indexing and .length count, uint16 unitsor characters?


/be
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

Reply via email to