Re: XHR LC comment: header encoding

Boris Zbarsky Mon, 04 Jan 2010 08:34:56 -0800

On 1/4/10 11:17 AM, Julian Reschke wrote:

For request headers, I would assume that the character encoding is
ISO-8859-1, and if a character can't be encoded using ISO-8859-1,
some kind of error handling occurs (ignore the character/ignore the
header/throw?).


From my limited testing it seems Firefox, Chrome, and Internet
Explorer use UTF-8 octets. E.g. "\xFF" in ECMAScript gets transmitted
as C3 BF (in octets). Opera sends "\xFF" as FF.


That's what Gecko does, correct.

For response headers, I'd expect that the octet sequence is decoded
using ISO-8859-1; so no specific error handling would be needed
(although the result may be funny when the intended encoding was


Firefox, Opera, and Internet Explorer indeed do this. Chrome decodes
as UTF-8 as far as I can tell.

More precisely, what Gecko does here is to take the raw byte string andbyte-inflate it (by setting the high byte of each 16-bit code unit to 0and the low byte to the corresponding byte of the given byte string)before returning it to JS.


This happens to more or less match "decoding as ISO-8859-1", but not quite.

Thanks for doing the testing. The discrepancy between setting and
getting worries me a lot :-).

In Gecko's case it seems to be an accident, at least historically. Thegetter and setter used to both do byte ops only (so byte inflation inthe getter, and dropping the high byte in the setter) until the fix for<https://bugzilla.mozilla.org/show_bug.cgi?id=232493>. The reviewcomments at <https://bugzilla.mozilla.org/show_bug.cgi?id=232493#c4>point out the UTF-8-vs-byte-inflation inconsistency here, but didn;tseem to get addressed...

 From HTTP's point of view, the header field value really is opaque. So
you can put there anything, as long as it fits into the header field ABNF.

True; what does that mean for converting header values to 16-bit codeunits in practice? Seems like byte-inflation might be the onlyreasonable thing to do...

Of course that only helps if senders and receivers agree on the
encoding.

True, but "encoding" here needs to mean more than just "encoding ofUnicode", since one can just stick random byte arrays, within the ABNFrestrictions, in the header, right?


-Boris

Re: XHR LC comment: header encoding

Reply via email to