Re: Full Unicode strings strawman

Allen Wirfs-Brock Mon, 16 May 2011 14:56:49 -0700

On May 16, 2011, at 1:38 PM, Wes Garland wrote:

> Allen;
> 
> Thanks for putting this together.  We use Unicode data extensively in both 
> our web and server-side applications, and being forced to deal with UTF-16 
> surrogate pair directly -- rather than letting the String implementation deal 
> with them -- is a constant source of mild pain.  At first blush, this 
> proposal looks like it meets all my needs, and my gut tells me the perf 
> impacts will probably be neutral or good. 
> 
> Two great things about strings composed of Unicode code points:
> 1) .length represents the number of code points, rather than the number of 
> pairs used in UTF-16, even if the underlying representation isn't UTF-16
> 2) S.charCodeAt(S.indexOf(X)) always returns the same kind of information (a 
> Unicode code point), regardless of whether X is in the BMP or not
> 
> If though this is a breaking change from ES-5, I support it 
> whole-heartedly.... but I expect breakage to be very limited. Provided that 
> the implementation does not restrict the storage of reserved code points 
> (D800-DF00), it should be possible for users using String as immutable 
> C-arrays to keep doing so. Users doing surrogate pair decomposition will 
> probably find that their code "just works", as those code points will never 
> appear in legitimate strings of Unicode code points.  Users creating Strings 
> with surrogate pairs will need to re-tool, but this is a small burden and 
> these users will be at the upper strata of Unicode-foodom.  I suspect that 
> 99.99% of users will find that this change will fix bugs in their code when 
> dealing with non-BMP characters.


Thanks, this is exactly my thinking on the subject.



_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to