>>> The crucial win of Allen's proposal comes down the road, when someone in a 
>>> certain locale *can* do s.indexOf(nonBMPChar) and win.
>> s.indexOf("\U+10000"),

> Ok, but "\U+..." does not work today.

Yes, that would be worth adding (IMO) as a convenience, regardless of whether 
the backend were UTF-16 or UTF-32.  Though requiring 6 digits is annoying.  I'd 
prefer something like \U+ffff or \U+10000 or \u+10FFFF being allowed, though 
you'd have to do something interesting if there were additional 0-9a-f after 
U+ffff/U+10000.  So \U+{ffff} could be explicit if necessary.

>> who cares that it ends up as UTF-16?  You can already do it, today, with 
>> s.indexOf("𐀀"). It happens that 𐀀 looks like d800 + dc00, but it still 
>> works.  Today.  This is no different than most other languages.

> My example was unclear. I meant something like a one-char indexOf where the 
> result would be used to slice that char.
> That doesn't work today. That's the point.
I wonder if we could allow "char" to have 21 bits in number context, and be a 
surrogate pair in string contexts.  

> But hey, if JS does not need to change then we can avoid trouble and keep on 
> using 16-bit indexing and length. Is this really the best outcome?

IMO we get 99% of what's needed by just changing to UTF-16 from UCS-2, although 
I'd like to see helpers like the U+10000 thing.

I think there are only 2 "tricky" parts with UTF-16 instead of UCS-2:
* Fixing the encode/decode url stuff so that it's UTF-8 instead of CESU-8.  
(Actually, just encode since decode would be obvious I thnk).
* Optionally, for convenience, getting a 21 bit number from a string surrogate 
pair. (because the existing API wouldn't know if you wanted just the D800 or 
the 10000 represented by the D800, DC00 pair).  That could be useful for 
finding out if the pair is like one of the math bold forms. (you could just do 
1D400 <= x <= 1D433 instead of trying to figure out the pairs).

-Shawn

> /be
Big-endian? ;-)
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to