On Jul 28, 2007, at 11:38 PM, Jonas Sicking wrote:

Maciej Stachowiak wrote:
On Jul 27, 2007, at 12:09 PM, Jonas Sicking wrote:

Anne van Kesteren wrote:
I've been looking at overrideMimeType implementations in Gecko and WebKit and it seems like they differ a bit. In Gecko it has to be invoked before send(), but in WebKit it would work if you invoke it just before getting responseXML or responseText. Neither implementation seems to do any input checks. If you have any opinion on how it should be specified I suppose now would be the time to air your thoughts.

Of course I prefer the mozilla way :)

It does seem fairly complicated to allow it to be set after the download is finished though. You do have the stream stored in .reponseBody, but at that point all encoding information has been lost. For HTML parsing (which I hope the spec will support in the future) there are a pile of rules used to guess the encoding, all of which would be useful to use, but can't be used if all you have access to is the unencoded responseBody.
Why would the encoding information be lost? The only sources of encoding info are the responseText itself and http headers, both of which the XMLHttpResponse needs to provide anyway.

ResponseText is not the raw byte stream gotten off the wire, it is already decoded into utf16 using whatever algorithm we define for determining the encoding. HTML decoding is a lot more complicated since you have to first guess an encoding, then start to parse the document, but if you find a

<meta http-equiv="Content-Type" content="text/html; charset=?">

Where charset is different from what you guessed, you have to restart from the beginning using the charset defined in the meta tag.

Yes, it would definitely be possible for the implementation to keep around the raw byte stream and either lazily decode responseText, or keep both the utf16 responseText and the raw byte stream around.

A third possibility is to remember what encoding you used when decoding and turn the UTF-16 back into the original bytes, though I suppose that wouldn't work if you hit encoding errors originally.

It is a bit quirky behavior though since setting overrideMimeType could then change the encoding and therefor both responseXML and responseText.

If XHR2 offers responseBody with a raw byte array of some kind, it will be required for implementations to keep the raw bytes around anyway.

Regards,
Maciej


Reply via email to