Re: Full Unicode strings strawman

Phillips, Addison Tue, 17 May 2011 13:01:21 -0700

Okay, that example was poorly chosen. However, it is the case that when a given 
string representation uses a particular code unit you often need to have 
programmatic access to it--for loops and such that iterate over the text, e.g.


It may be an accident ofhistory, but that doesn't mean that scripters don't 
need access to it.

Addison

Sent from my iPhone

On May 17, 2011, at 12:52 PM, "Wes Garland" <w...@page.ca<mailto:w...@page.ca>> 
wrote:

On 17 May 2011 15:00, Phillips, Addison 
<<mailto:addi...@lab126.com>addi...@lab126.com<mailto:addi...@lab126.com>> 
wrote:
2. Allowing unpaired surrogates is a *requirement*. Yes, such a string is 
"ill-formed", but there are too many cases in which one might wish to have such 
"broken" strings for scripting purposes.
3. We should have escape syntax for supplementary characters (such as 
\U0010000). Looking up the surrogate pair for a given Unicode character is 
extremely inconvenient and is not self-documenting.
...
As Shawn notes, basically, there are three ways that one might wish to access 
strings:
...
- as code units (encoding units of text)

I don't understand why (except that it is there by an accident of history) that 
it is desirable to expose a particular low-level detail about one possible 
encoding for Unicode characters to end-user programmers.

Your point about database storage only holds if the database happens to store 
Unicode strings encoded in UTF-16. It could just as easily use UTF-8, UTF-7, or 
UTF-32. For that matter, the database input routine could filter all characters 
not in ISO-Latin-1 and store only the lower half of non-surrogate-pair UTF-16 
code units.

Wes

--
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode strings strawman

Reply via email to