Andrew Oakley wrote:
Issues only arise in code that tries to treat a string as an array of
16-bit integers, and I don't think we should be particularly bothered by
performance of code which misuses strings in this fashion (but clearly
this should still work without opt-in to new string handling).

This is all strings in JS and the DOM, today.

That is, we do not have any measure of code that treats strings as uint16s, forges strings using "\uXXXX", etc. but the ES and DOM specs have allowed this for > 14 years. Based on bitter experience, it's likely that if we change by fiat to 21-bit code points from 16-bit code units, some code on the Web will break.

And as noted in the o.p. and in the thread based on Allen's proposal last year, browser implementations definitely count on representation via array of 16-bit integers, with length property or method counting same.

Breaking the Web is off the table. Breaking implementations, less so. I'm not sure why you bring up UTF-8. It's good for encoding and decoding but for JS, unlike C, we want string to be a high level "full Unicode" abstraction. Not bytes with bits optionally set indicating more bytes follow to spell code points.

I think this is a nicer and more flexible model than string
representations being dependent on which heap they came from - all
issues related to encoding can be contained in the String object
implementation.

You're ignoring the compatibility break here. Browser vendors can't afford to do that.

While this is being discussed, for any new string handling I think we
should make any invalid strings (according to the rules in Unicode)
cause some kind of exception on creation.
This is future-hostile if done for all code points. If done only for the code points in [D800,DFFF] both for literals using "\u{...}" and for constructive methods such as String.fromCharCode, then I agree.

/be
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to