Re: Code points vs Unicode scalar values

Allen Wirfs-Brock Wed, 04 Sep 2013 10:22:50 -0700

On Sep 4, 2013, at 9:46 AM, Brendan Eich wrote:

> Mathias Bynens wrote:
>> I think what Anne means to say is that `String.fromCodePoint(0xD800)` 
>> returns '\uD800` as per that algorithm, which is a lone surrogate (and not a 
>> scalar value).
> 
> Gotcha. Yes, the new APIs seem to let you write and read lone surrogates. But 
> the legacy APIs won't go away, and IIRC the reasoning is that we're better 
> off exposing the data than trying to abstract away from it in the new APIs. 
> Allen?


First a couple meta points
  1)  this stuff is mostly Norbert's design so he may be able to provide better 
rationale for some of the decisions. 
  2)  there are a number of open bugs on the current spec. WRT Unicode 
handling.  We'll get around to those soon.

WRT the larger issue, these API are for people who need to deal with text at 
the encoding level. They might be writing their own 
encoders/decoders/translators.  At that level,  surrogates really are  valid 
code points even though they are not valid Unicode scalar values. People 
programming at that level in some cases have to deal with malformed encodings.  
For example, they might be intentionally generating invalid UTF-16 encodings as 
part of a test driver. 

Note that the behavior of String.fromCodePoint parrallels that of string 
literals:

String.fromCodePoint(0x1d11e)
StringfromCodePoint(0xd834,0xdd12)
"\u{1d11e}"
"\ud834\udd12"

all produce the same string value.

Allen

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Code points vs Unicode scalar values

Reply via email to