On May 16, 2011, at 1:38 PM, Wes Garland wrote: > Allen; > > Thanks for putting this together. We use Unicode data extensively in both > our web and server-side applications, and being forced to deal with UTF-16 > surrogate pair directly -- rather than letting the String implementation deal > with them -- is a constant source of mild pain. At first blush, this > proposal looks like it meets all my needs, and my gut tells me the perf > impacts will probably be neutral or good. > > Two great things about strings composed of Unicode code points: > 1) .length represents the number of code points, rather than the number of > pairs used in UTF-16, even if the underlying representation isn't UTF-16 > 2) S.charCodeAt(S.indexOf(X)) always returns the same kind of information (a > Unicode code point), regardless of whether X is in the BMP or not > > If though this is a breaking change from ES-5, I support it > whole-heartedly.... but I expect breakage to be very limited. Provided that > the implementation does not restrict the storage of reserved code points > (D800-DF00), it should be possible for users using String as immutable > C-arrays to keep doing so. Users doing surrogate pair decomposition will > probably find that their code "just works", as those code points will never > appear in legitimate strings of Unicode code points. Users creating Strings > with surrogate pairs will need to re-tool, but this is a small burden and > these users will be at the upper strata of Unicode-foodom. I suspect that > 99.99% of users will find that this change will fix bugs in their code when > dealing with non-BMP characters.
Thanks, this is exactly my thinking on the subject. _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss