Re: XHR LC comment: header encoding

Boris Zbarsky Mon, 04 Jan 2010 11:37:41 -0800

On 1/4/10 11:44 AM, Julian Reschke wrote:

This happens to more or less match "decoding as ISO-8859-1", but not
quite.
...


Not quite?

More precisely, it happens to not quite match what browsers callISO-8859-1, which is actually Windows-1252. And in particular,ISO-8859-1 doesn't define the behavior of the 0x7F-0x9F range, whereasbyte-inflation does (mapping the range to various Unicode controlcharacter) and Windows-1252 does as well, in a different way (mappingthe range to various printable Unicode characters).

It at least preserves all the information that was there and would allow
a caller to re-decode as UTF-8 as a separate step.


Yep.

Right now there is no interoperable encoding, so the best thing to do in
APIs that use character sequences instead of octets is to preserve as
much information as possible.


That seems reasonable...

It would be nice if we could find out whether anybody relies on the
current implementation. Maybe switch it back to byte inflation in
Mozilla trunk?

Mozilla trunk already does byte _inflation_ when converting from headerbytes into a JavaScript string. I assume you meant to convertJavaScript strings into header bytes via dropping the high byte of each16-bit code unit. However that fails the "preserve as much informationas possible" test... In particular, as soon as any Unicode characteroutside the U+0000-U+00FF range is used, byte-dropping loses information.


-Boris

Re: XHR LC comment: header encoding

Reply via email to