On Jul 28, 2007, at 11:38 PM, Jonas Sicking wrote:
Maciej Stachowiak wrote:
On Jul 27, 2007, at 12:09 PM, Jonas Sicking wrote:
Anne van Kesteren wrote:
I've been looking at overrideMimeType implementations in Gecko
and WebKit and it seems like they differ a bit. In Gecko it has
to be invoked before send(), but in WebKit it would work if you
invoke it just before getting responseXML or responseText.
Neither implementation seems to do any input checks.
If you have any opinion on how it should be specified I suppose
now would be the time to air your thoughts.
Of course I prefer the mozilla way :)
It does seem fairly complicated to allow it to be set after the
download is finished though. You do have the stream stored
in .reponseBody, but at that point all encoding information has
been lost. For HTML parsing (which I hope the spec will support in
the future) there are a pile of rules used to guess the encoding,
all of which would be useful to use, but can't be used if all you
have access to is the unencoded responseBody.
Why would the encoding information be lost? The only sources of
encoding info are the responseText itself and http headers, both of
which the XMLHttpResponse needs to provide anyway.
ResponseText is not the raw byte stream gotten off the wire, it is
already decoded into utf16 using whatever algorithm we define for
determining the encoding. HTML decoding is a lot more complicated
since you have to first guess an encoding, then start to parse the
document, but if you find a
<meta http-equiv="Content-Type" content="text/html; charset=?">
Where charset is different from what you guessed, you have to
restart from the beginning using the charset defined in the meta tag.
Yes, it would definitely be possible for the implementation to keep
around the raw byte stream and either lazily decode responseText, or
keep both the utf16 responseText and the raw byte stream around.
A third possibility is to remember what encoding you used when
decoding and turn the UTF-16 back into the original bytes, though I
suppose that wouldn't work if you hit encoding errors originally.
It is a bit quirky behavior though since setting overrideMimeType
could then change the encoding and therefor both responseXML and
responseText.
If XHR2 offers responseBody with a raw byte array of some kind, it
will be required for implementations to keep the raw bytes around
anyway.
Regards,
Maciej