On 21 February 2012 00:03, Brendan Eich <bren...@mozilla.com> wrote: > These are byte-based enodings, no? What is the problem inflating them by > zero extension to 16 bits now (or 21 bits in the future)? You can't make an > invalid Unicode character from a byte value. >
One of my examples, GB 18030, is a four-byte encoding and a Chinese government standard. It is a mapping onto Unicode, but this mapping is table-driven rather than algorithm driven like the UTF-* transport formats. To provide a single example, Unicode 0x2259 maps onto GB 18030 0x8136D830. You're right about Big5 being byte-oriented, maybe this was a bad example, although it is a double-byte charset. It works by putting ASCII down low making bytes above 0x7f escapes into code pages dereferenced by the next byte. Each code point is encoded with one or two bytes, never more. If I were developing with Big5 in JS, I would store the byte stream 4a 4b d8 00 c1 c2 4c as 004a 004b d800 c1c2 004c. This would allow me to use JS regular expressions and so on. Anyway, Big5 punned into JS strings (via a C or C++ API?) is *not* a strong > use-case for ignoring invalid characters. > Agreed - I'm stretching to see if I can stretch far enough to find a real problem with BRS -- because I really want it. But the data does not need to arrive from C API -- it could easily be delivered by an XHR request where, say, the remote end dumps database rows into a transport format based around evaluating JS string literals (like JSON). Ball one. :-P > If I hit the batter, does he get to first base? We still haven't talked about equality and normalization, I suppose that can wait. Wes -- Wesley W. Garland Director, Product Development PageMail, Inc. +1 613 542 2787 x 102
_______________________________________________ es-discuss mailing list es-discuss@mozilla.org https://mail.mozilla.org/listinfo/es-discuss