On Mon, Dec 5, 2011 at 3:15 PM, Glenn Adams <gl...@skynav.com> wrote:
> But, if the browser does not support UTF-32, then the table in step (4) of > [1] is supposed to apply, which would interpret the initial two bytes FF FE > as UTF-16LE according to the current language of [1], and further, return a > confidence level of "certain". > > I see the problem now. It seems that the table in step (4) should be > changed to interpret an initial FF FE as UTF-16BE only if the following two > bytes are not 00. > That wouldn't actually bring browsers and the spec closer together; it would actually bring them further apart. At first glance, it looks like it makes the spec allow WebKit and IE's behavior, which (unfortunately) includes UTF-32 detection, by allowing them to fall through to step 7, where they're allowed to detect things however they want. However, that's ignoring step 5. If step 4 passes through, then step 5 would happen next. That means this carefully-constructed file would be detected as UTF-8 by step 5: http://zewt.org/~glenn/test-utf32-with-ascii-meta.html-no-encoding That's not what happens in any browser; FF detects it as UTF-16 and WebKit and IE detect it as UTF-32. This change would require it to be detected as UTF-8, which would have security implications if implemented, eg. a page outputting escaped user-inputted text in UTF-32 might contain a string like this, followed by a hostile <script>, when interpreted as UTF-8. This really isn't worth spending time on; you've free to press this if you like, but I'm moving on. -- Glenn Maynard