Re: New full Unicode for ES6 idea

Brendan Eich Tue, 21 Feb 2012 07:30:19 -0800

Andrew Oakley wrote:

On 02/20/12 16:47, Brendan Eich wrote:

>  Andrew Oakley wrote:

>>  Issues only arise in code that tries to treat a string as an array of
>>  16-bit integers, and I don't think we should be particularly bothered by
>>  performance of code which misuses strings in this fashion (but clearly
>>  this should still work without opt-in to new string handling).

>> This is all strings in JS and the DOM, today.>> That is, we do not have any measure of code that treats strings as

>  uint16s, forges strings using "\uXXXX", etc. but the ES and DOM specs
>  have allowed this for>  14 years. Based on bitter experience, it's
>  likely that if we change by fiat to 21-bit code points from 16-bit code
>  units, some code on the Web will break.


Sorry, I don't think I was particularly clear.  The point I was trying
to make is that we can*pretend*  that code points are 16-bit but
actually use a 21-bit representation internally.

So far, that's like Allen's proposal from last year(http://wiki.ecmascript.org/doku.php?id=strawman:support_full_unicode_in_strings).But you didn't say how iteration (indexing and .length) work.

If content requests
proper Unicode support we simply switch to allowing 21-bit code-points
and stop encoding characters outside the BMP using surrogate pairs
(because the characters now fit in a single code point).

How does content request proper Unicode support? Whatever that gestureis, it's big and red ;-). But we don't have such a switch or button topress like that, yet.

If a .js or .html file as fetched from a server has a UTF-8 encoding,indeed non-BMP characters in string literals will be transcoded inopen-source browsers and JS engines that use uint16 vectors internally,but each part of the surrogate pair will take up one element in theuint16 vector. Let's take this now as a "content request" to use fullUnicode. But the .js file was developed 8 years ago and assumes two codeunits, not one. It hardcodes for that assumption, somehow (indexing,.length exact value, indexOf('\ud800'), etc.). It is now broken.

And non-literal non-BMP characters won't be helped by transcodingdifferently when the .js or .html file is fetched. They'll just change"size" at runtime.


/be

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: New full Unicode for ES6 idea

Reply via email to