On 1/4/10 11:17 AM, Julian Reschke wrote:
For request headers, I would assume that the character encoding is
ISO-8859-1, and if a character can't be encoded using ISO-8859-1,
some kind of error handling occurs (ignore the character/ignore the
header/throw?).

From my limited testing it seems Firefox, Chrome, and Internet
Explorer use UTF-8 octets. E.g. "\xFF" in ECMAScript gets transmitted
as C3 BF (in octets). Opera sends "\xFF" as FF.

That's what Gecko does, correct.

For response headers, I'd expect that the octet sequence is decoded
using ISO-8859-1; so no specific error handling would be needed
(although the result may be funny when the intended encoding was

Firefox, Opera, and Internet Explorer indeed do this. Chrome decodes
as UTF-8 as far as I can tell.

More precisely, what Gecko does here is to take the raw byte string and byte-inflate it (by setting the high byte of each 16-bit code unit to 0 and the low byte to the corresponding byte of the given byte string) before returning it to JS.

This happens to more or less match "decoding as ISO-8859-1", but not quite.

Thanks for doing the testing. The discrepancy between setting and
getting worries me a lot :-).

In Gecko's case it seems to be an accident, at least historically. The getter and setter used to both do byte ops only (so byte inflation in the getter, and dropping the high byte in the setter) until the fix for <https://bugzilla.mozilla.org/show_bug.cgi?id=232493>. The review comments at <https://bugzilla.mozilla.org/show_bug.cgi?id=232493#c4> point out the UTF-8-vs-byte-inflation inconsistency here, but didn;t seem to get addressed...

 From HTTP's point of view, the header field value really is opaque. So
you can put there anything, as long as it fits into the header field ABNF.

True; what does that mean for converting header values to 16-bit code units in practice? Seems like byte-inflation might be the only reasonable thing to do...

Of course that only helps if senders and receivers agree on the
encoding.

True, but "encoding" here needs to mean more than just "encoding of Unicode", since one can just stick random byte arrays, within the ABNF restrictions, in the header, right?

-Boris

Reply via email to