Jussi Kalliokoski wrote:
I'm not sure what to think about this, being a big fan of the UTF-8 simplicity. :)

UTF-8 is great, but it's a transfer format, perfect for C and other such systems languages (especially ones that use byte-wide char from old days). It is not appropriate for JS, which gives users a "One True String" (sorry for caps) primitive type that has higher-level "just Unicode" semantics. Alas, JS's "just Unicode" was from '96.

There are lots of transfer formats and character set encodings. Implementations could use many, depending on what chars a given string uses. E.g. ASCII + UTF-16, UTF-8 only as you suggest, other combinations. But this would all be under the hood, and at some cost to the engine as well as some potential (space, mostly) savings.

But anyhow, I like the idea of opt-in, actually so much that I started thinking, why not make JS be encoding-agnostic?

That is precisely the idea. Setting the BRS to "full Unicode" gives the appearance of 21 bits per character via indexing and length accounting. You'd have to spell non-BMP literal escapes via "\u{...}", no big deal.

What I mean here is that maybe we could have multi-charset Strings in JS?

Now you're saying something else. Having one agnostic higher-level "just Unicode" string type is one thing. That's JS's design goal, always has been. It does not imply adding multiple observable CSEs or UTFs that break the "just Unicode" abstraction.

If you can put a JS string in memory for low-level systems languages such as C to view, of course there are abstraction breaks. Engine APIs may or may not allow such views for optimizations. This is an issue, for sure, when embedding (e.g. V8 in Node). It's not a language design issue, though, and I'm focused on observables in the language because that is where JS currently fails by livin' in the '90s.

/be
_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to