On Fri, Aug 30, 2013 at 6:17 PM, Adam Roach <a...@mozilla.com> wrote: > > It seems to me that there's an important balance here between (a) letting > developers discover their configuration error and (b) allowing users to > render misconfigured content without specialized knowledge.
It's worth noting that for other classes of authoring errors (except for errors in https deployment) we don't give the user the tools to remedy authoring errors. > Both of these are valid concerns, and I'm afraid that we're not assigning > enough weight to the user perspective. Assigning weight to the *short-term* user perspective seems to be what got us into this mess in the first place. If Netscape had never had a manual override for the character encoding or locale-specific differences, user-exposed brokenness would have quickly taught authors to get their act encoding together--especially in the context of languages like Japanese where a wrong encoding guess makes the page completely unreadable. (The obvious counter-argument is that in the case of languages that use a non-Latin, getting the encoding wrong is near the the YSoD level of disaster and it's agreed that XML's error handling was a mistake compared to HTML's. However, HTML's error handling surfaces no UI choices to the user, works without having to reload the page and is now well specified. Furthermore, even in the case of HTML, hindsight says we'd be better off if no browser had tried to be too helpful about fixing <i><b></i><b> in the first place.) > I think we can find some middle ground here, where we help developers > discover their misconfiguration, while also handing users the tool they need > to fix it. Maybe an unobtrusive bar (similar to the password save bar) that > says something like: "This page's character encoding appears to be > mislabeled, which might cause certain characters to display incorrectly. > Would you like to reload this page as Unicode? [Yes] [No] [More Information] > [x]". Why should we surface this class of authoring error to the UI in a way that asks the user to make a decision considering how rare this class of authoring error is? Are there other classes of authoring errors that you think should have UI for the user to second-guess the author? If yes, why? If not, why not? That is, why is the case where text/html is in fact valid UTF-8 and contains non-ASCII characters but has not been declared as UTF-8 so special compared to other possible authoring errors that it should have special treatment? On Fri, Aug 30, 2013 at 8:24 PM, Mike Hoye <mh...@mozilla.com> wrote: > For what it's worth Internet Explorer handled this (before UTF-8 and caring > about JS performance were a thing) by guessing what encoding to use, > comparing a letter-frequency-analysis of a page's content to a table of what > bytes are most common in which in what encodings of whatever languages. Is there evidence of IE doing this in locales other than Japanese, Russian and Ukrainian? Or even locales other than Japanese? Firefox does this only for the Japanese, Russian and Ukrainian locales. (FWIW, studying whether this is still needed for the Russian and Ukrainian locales is https://bugzilla.mozilla.org/show_bug.cgi?id=845791 . As for Japanese, some sort of detection magic is probably staying for the foreseeable future. It appears that Microsoft fairly recently tried to take ISO-2022-JP out of their detector for security reasons but had to put it back for compatibility: http://support.microsoft.com/kb/2416400 http://support.microsoft.com/kb/2482017 ) > It's > probably not a suitable approach in modernity, because of performance > problems and horrible-though-rare edge cases. See point #3 in https://bugzilla.mozilla.org/show_bug.cgi?id=910211#c2 On Fri, Aug 30, 2013 at 9:33 PM, Joshua Cranmer 🐧 <pidgeo...@gmail.com> wrote: > The problem I have with this approach is that it assumes that the page is > authored by someone who definitively knows the charset, which is not a > scenario which universally holds. Suppose you have a page that serves up the > contents of a plain text file, so your source data has no indication of its > charset. What charset should the page report? Your scenario assumes that the page template is ASCII-only. If it isn't, browser-side guessing doesn't solve the problem. Even when the template is ASCII-only, whoever wrote the inclusion on the server probably has better contextual knowledge about what the encoding of the input text could be then the browser has. -- Henri Sivonen hsivo...@hsivonen.fi http://hsivonen.iki.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform