Okay, that example was poorly chosen. However, it is the case that when a given string representation uses a particular code unit you often need to have programmatic access to it--for loops and such that iterate over the text, e.g.
It may be an accident ofhistory, but that doesn't mean that scripters don't need access to it. Addison Sent from my iPhone On May 17, 2011, at 12:52 PM, "Wes Garland" <w...@page.ca<mailto:w...@page.ca>> wrote: On 17 May 2011 15:00, Phillips, Addison <<mailto:addi...@lab126.com>addi...@lab126.com<mailto:addi...@lab126.com>> wrote: 2. Allowing unpaired surrogates is a *requirement*. Yes, such a string is "ill-formed", but there are too many cases in which one might wish to have such "broken" strings for scripting purposes. 3. We should have escape syntax for supplementary characters (such as \U0010000). Looking up the surrogate pair for a given Unicode character is extremely inconvenient and is not self-documenting. ... As Shawn notes, basically, there are three ways that one might wish to access strings: ... - as code units (encoding units of text) I don't understand why (except that it is there by an accident of history) that it is desirable to expose a particular low-level detail about one possible encoding for Unicode characters to end-user programmers. Your point about database storage only holds if the database happens to store Unicode strings encoded in UTF-16. It could just as easily use UTF-8, UTF-7, or UTF-32. For that matter, the database input routine could filter all characters not in ISO-Latin-1 and store only the lower half of non-surrogate-pair UTF-16 code units. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss