Re: Code points vs Unicode scalar values

Anne van Kesteren Wed, 04 Sep 2013 13:10:42 -0700

On Wed, Sep 4, 2013 at 6:22 PM, Allen Wirfs-Brock <[email protected]> wrote:
> WRT the larger issue, these API are for people who need to deal with text at 
> the encoding level.


At that level you want to deal with bytes. And we have an API for
this: http://encoding.spec.whatwg.org/#api I'd hope people would be
smart enough not to add more encoding cruft, but we can't stop them
and I don't think this API should be designed for them.


> For example, they might be intentionally generating invalid UTF-16 encodings 
> as part of a test driver.

Generate what though? If you want to generate surrogates you can
always go back to using 16-bit code units. There's no need for this to
leak through to the higher level abstraction.


> Note that the behavior of String.fromCodePoint parrallels that of string 
> literals:
>
> String.fromCodePoint(0x1d11e)
> StringfromCodePoint(0xd834,0xdd12)
> "\u{1d11e}"
> "\ud834\udd12"
>
> all produce the same string value.

If "\u{...}" is new, it'd be great if that banned surrogates too.

I learned from Simon today Rust is doing the same thing for its char
type. (Rust has some other issues where you can assign arbitrary byte
values to a string even in safe mode, but it's still early days in
that language.)


-- 
http://annevankesteren.nl/
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Code points vs Unicode scalar values

Reply via email to