Re: Full Unicode strings strawman

Allen Wirfs-Brock Mon, 16 May 2011 15:18:18 -0700

On May 16, 2011, at 2:19 PM, Mark Davis ☕ wrote:

> I'm quite sympathetic to the goal, but the proposal does represent a 
> significant breaking change. The problem, as Shawn points out, is with 
> indexing. Before, the strings were defined as UTF16.


Not by the ECMAScript specification

> 
> Take a sample string "\ud800\udc00\u0061" = "\u{10000}\u{61}". Right now, the 
> 'a' (the \u{61}) is at offset 2. If the proposal were accepted, the 'a' would 
> be at offset 1.

It the string is written as   \ud800\udc00\u0061" the 'a' will be at offset 1, 
even in the new proposal.  It would only be at offset 1 if it was written as 
"\u+010000\u+000061"  (using the literal notation from the proposal).

> This will definitely cause breakage in existing code;

How does this break existing code.  Existing code can not say 
"\u+010000\u+000061".  As I've pointed out elsewhere on this thread existing 
libraries that do UTF-16 encoding/decoding must continue to do so even under 
this new proposal. 

> characters are in different positions than they were, even characters that 
> are not supplemental ones. All it takes is one supplemental character before 
> the current position and the offsets will be off for the rest of the string.


Allen

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to