Re: New full Unicode for ES6 idea

Brendan Eich Tue, 21 Feb 2012 09:55:23 -0800

Phillips, Addison wrote:

Because it has always been possible, it’s difficult to say how manyscripts have transported byte-oriented data by “punning” the data intostrings. Actually, I think this is more likely to be truly binary datarather than text in some non-Unicode character encoding, but anythingis possible, I suppose. This could include using non-character valueslike “FFFE”, “FFFF” in addition to the surrogates. A BRS-runningimplementation would break a script that relied on String being asequence of 16-bit unsigned integer values with no error checking.

Allen's view of the BRS-enabled semantics would have 16-bit "GIGO"without exceptions -- you'd be storing 16-bit values, whatever theirsource (including "\uXXXX" literals spelling invalid characters andunmatched surrogates) in at-least-21-bit elements of strings, andreading them back.

My concern and reason for advocating early or late errors on shenaniganswas that people today writing surrogate pais literally and then takingextra pains in JS or C++ (whatever the host language might be) toprocess them as single code points and characters would be broken by theBRS-enabled behavior of separating the parts into distinct code points.

But that's pessimistic. It could happen, but OTOH anyone codingsurrogate pairs might want them to read back piece-wise when indexing.In that case what Allen proposes, storing each formerly 16-bit codeunit, however expressed, in the wider 21-or-more-bits unit, and readingback likewise, would "just work".

Sorry if this is all obvious. Mainly I want to throw in my lot withAllen's exception-free literal/constructor approach. The encoding APIsshould throw on invalid Unicode but literals and strings as immutable16-bit storage buffers should work as today.


/be
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

Reply via email to