On May 16, 2011, at 2:19 PM, Mark Davis ☕ wrote: > I'm quite sympathetic to the goal, but the proposal does represent a > significant breaking change. The problem, as Shawn points out, is with > indexing. Before, the strings were defined as UTF16.
Not by the ECMAScript specification > > Take a sample string "\ud800\udc00\u0061" = "\u{10000}\u{61}". Right now, the > 'a' (the \u{61}) is at offset 2. If the proposal were accepted, the 'a' would > be at offset 1. It the string is written as \ud800\udc00\u0061" the 'a' will be at offset 1, even in the new proposal. It would only be at offset 1 if it was written as "\u+010000\u+000061" (using the literal notation from the proposal). > This will definitely cause breakage in existing code; How does this break existing code. Existing code can not say "\u+010000\u+000061". As I've pointed out elsewhere on this thread existing libraries that do UTF-16 encoding/decoding must continue to do so even under this new proposal. > characters are in different positions than they were, even characters that > are not supplemental ones. All it takes is one supplemental character before > the current position and the offsets will be off for the rest of the string. Allen
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss