On May 16, 2011, at 5:42 PM, Shawn Steele wrote: > It's clear why we want to support the full Unicode range, but it's less clear > to me why UTF-32 would be desirable internally. (Sure, it'd be nice for > conversion types). > > What UTF-32 has that UTF-16 doesn't is the ability to walk a string without > accidentally chopping up a surrogate pair. However, in practice, stepping > over surrogates is pretty much the least of the problems with walking a > string. Combining characters and the like cause numerous typographical > shapes/glyphs to be represented by more than one Unicode codepoint, even in > UTF-32. We don't see that in Latin so much, especially in NFC, but in some > scripts most characters require multiple code points. > > In other words, if I'm trying to find "safe" places to break a string, append > text, or many other operations, then UTF-16 is no more complicated than > UTF-32, even when considering surrogates. > > UTF-32 would cause a huge amount of ambiguity though about what happens to > all of those UTF-16 sequences that currently sort-of work even though they > shouldn't really because ES is nominally UCS-2. > > -Shawn
One reason is that none of the built-in string methods understand surrogate pairs. If you want to do any string processing that recognizes such pairs you have to either handles such pairs as multi-character sequences or do you own character by character processing. Allen _______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss