The problem as I see it is that the current spec text for charset detection effectively *requires* a browser that does not "support" UTF-32 to explicitly ignore content metadata that may be correct (if it specifies UTF-32 as charset param), and further, to explicitly mis-label such content as UTF-16LE in the case that the first four bytes are FF FE 00 00. Indeed, the current algorithm requires mis-labelling such content as UTF-16LE with a confidence of "certain".
The current text is also ambiguous with respect to what "support" means in step (2) of Section 8.2.2.1 of [1]. Which of the following are meant by "support"? - recognize with sniffer - be capable of using directly as internal coding - be capable of transcoding to internal coding [1] http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding On Mon, Dec 5, 2011 at 3:10 PM, Ian Hickson <i...@hixie.ch> wrote: > On Mon, 5 Dec 2011, Glenn Adams wrote: > > > > I see the problem now. It seems that the table in step (4) should be > > changed to interpret an initial FF FE as UTF-16BE only if the following > > two bytes are not 00. > > The current text is intentional. UTF-32 is explicitly not supported by the > HTML standard. > > -- > Ian Hickson U+1047E )\._.,--....,'``. fL > http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. > Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.' >