On 07/24/2013 04:19 PM, Chris Angelico wrote: > I'm referring here to objections like jmf's, and also to threads like this: > > http://mozilla.6506.n7.nabble.com/Flexible-String-Representation-full-Unicode-for-ES6-td267585.html > > According to the ECMAScript people, UTF-16 and exposing surrogates to > the application is a critical feature to be maintained. I disagree. > But it's not my language, so I'm stuck with it. (I ended up writing a > little wrapper function in C that detects unpaired surrogates, but > that still doesn't deal with the possibility that character indexing > can create a new character that was never there to start with.)
This is starting to drift off topic here now, but after reading your comments on that post, and others objections, I don't fully understand why making strings simply "unicode" in javascript breaks compatibility with older scripts. What operations are performed on strings that making unicode an abstract type would break? Is it just in the input and output of text that must be decoded and encode? Why should a script care about the internal representation of unicode strings? Is it because the incorrect behavior of UTF-16 and the exposed surrogates (and subsequent incorrect indexing) are actually depended on by some scripts? -- http://mail.python.org/mailman/listinfo/python-list