Andrew Oakley wrote:
On 02/20/12 16:47, Brendan Eich wrote:
>  Andrew Oakley wrote:
>>  Issues only arise in code that tries to treat a string as an array of
>>  16-bit integers, and I don't think we should be particularly bothered by
>>  performance of code which misuses strings in this fashion (but clearly
>>  this should still work without opt-in to new string handling).
> > This is all strings in JS and the DOM, today. > > That is, we do not have any measure of code that treats strings as
>  uint16s, forges strings using "\uXXXX", etc. but the ES and DOM specs
>  have allowed this for>  14 years. Based on bitter experience, it's
>  likely that if we change by fiat to 21-bit code points from 16-bit code
>  units, some code on the Web will break.

Sorry, I don't think I was particularly clear.  The point I was trying
to make is that we can*pretend*  that code points are 16-bit but
actually use a 21-bit representation internally.

So far, that's like Allen's proposal from last year (http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings). But you didn't say how iteration (indexing and .length) work.

If content requests
proper Unicode support we simply switch to allowing 21-bit code-points
and stop encoding characters outside the BMP using surrogate pairs
(because the characters now fit in a single code point).

How does content request proper Unicode support? Whatever that gesture is, it's big and red ;-). But we don't have such a switch or button to press like that, yet.

If a .js or .html file as fetched from a server has a UTF-8 encoding, indeed non-BMP characters in string literals will be transcoded in open-source browsers and JS engines that use uint16 vectors internally, but each part of the surrogate pair will take up one element in the uint16 vector. Let's take this now as a "content request" to use full Unicode. But the .js file was developed 8 years ago and assumes two code units, not one. It hardcodes for that assumption, somehow (indexing, .length exact value, indexOf('\ud800'), etc.). It is now broken.

And non-literal non-BMP characters won't be helped by transcoding differently when the .js or .html file is fetched. They'll just change "size" at runtime.

/be

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to