# Summary The template says this section should state the benefit to Web developers. There is intentionally no benefit to Web developers. This pair of features is meant to benefit users who encounter badly-authored legacy pages, so that Firefox can retain users instead of the users trying in Chrome on the new Edge. That is, this is about the user experience of browsing the legacy long tail of the Web. This is not about cool new stuff. In that sense, this feature is out of the scope of "Intent to prototype" emails, but I'm sending one, because this is a Web-visible feature in the sense that Web content could detect its presence.
For newly-authored HTML pages, Web developers should use UTF-8 and declare it (via UTF-8 BOM, <meta charset=utf-8>, or HTTP Content-Type: text/html; charset=utf-8). The first and last option apply to text/plain, too. With that out of the way, there are two features contemplated here: 1. On _unlabeled_ text/html and text/plain pages, autodetect _legacy_ encoding, excluding UTF-8, for non-file: URLs and autodetect the encoding, including UTF-8, for file: URLs. Elevator pitch: Chrome already did this unilaterally. The motivation is to avoid a situation where a user switches to a Chromium-based as a result of browsing the legacy Web or local files. As in Chrome, UTF-8 is deliberately excluded from possible detection outcomes on non-file: URLs in order to avoid creating a situation where the feature would have an unwanted effect on future Web development by causing Web developers to rely on UTF-8 detection, which would make the platform more brittle. That is, one type of user-facing problem is deliberately left unfixed in order to avoid a feedback loop into authoring that would generate more of the problem. However, feature #2 below continues to allow users to address this problem at the cost of taking an explicit menu action. (Full discussion of the implications of detecting UTF-8 for HTML on non-file: URLs needs a blog post, which I intend to write but which is out of scope for this summary. Detecting UTF-8 for text/plain would be less problematic, since there are no scripts and stylesheets that the encoding would get inherited into and a reload wouldn't re-run any script side effects, so I'm willing to entertain the idea of detecting UTF-8 on non-file: text/plain, but it seems like a slippery slope.) (Why now? Edge switching from the "like Safari" camp to the "like Chrome" camp made it seem substantially less likely that everyone would agree to get rid of guessing, so it no longer makes sense to push for that outcome for the Web Platform. Also, now that UTF-8 has clearly won for new Web development, this feature is likely to be less harmful that it could have been in the past.) 2. Replace the Text Encoding submenu with a single menu item Override Text Encoding, which forces the detector to run in a mode that ignores the TLD hint and allows UTF-8 as an outcome. (Disabled in the situations where the menu is presently disabled and not taking effect in the situations where the menu presently does not take effect. The menu is presently disabled if the top-level page is in UTF-8 and valid, the top-level page started with a BOM, the top-level page is UTF-16[BE|LE], or the top-level page is neither text/html nor text/plain. The menu presently doesn't take effect if the type of the page is neither text/html nor text/plain, the HTTP layer declared UTF-16[BE|LE], or stream starts with a BOM. [As you can see, the latter list is a subset of the former, so it should be possible for the latter list to matter only for framed documents.]) Elevator pitch: Telemetry shows a) a substantial proportion of menu use is for overriding _labeled_ pages and b) a substantial proportion of menu use is to override an already overridden encoding suggesting that users are bad at making a choice from the menu. Retaining a user-invocable override continues to address the issue of mislabeled content (which is presently addressed by Firefox and by desktop Safari by providing the menu) while eliminating the need for the user to figure out what to choose. (Basically, feature #2 is easy to provide once feature #1 exists.) # Bug https://bugzilla.mozilla.org/show_bug.cgi?id=1551276 # Standard The HTML Standard authorizes the existence of this kind of component without specifying exactly how it should work. Beyond that, there is no standard, but the implementation developed here has deliberately been created in such a way that contributing the data tables to the WHATWG and reversing the code into spec English would be _possible_ if there's cross-vendor interest. In contrast, the code in Chromium is a non-Chromium-originating over-the-wall dump of mystery (lacking public design documentation as well as tooling for regenerating the generated parts) C++ that even the Chrome developers can't/won't change beyond making it compile with newer compilers.(Furthermore, my implementation relies on the browser already containing an implementation of the Encoding Standard. This cuts the binary size impact to less than one fourth compared to adopting the detector from Chrome, which doesn't benefit from any data tables that a browser already has to have anyway.) I've gone with demonstrating feasibility before further cross-vendor discussion, because this is a user retention measure in response to a unilateral move on Chrome's part and Safari on iOS doesn't face pressure from switching to browsers with a different Web engine. # Platform coverage All platforms. # Preference There will probably be one for an initial testing period, but I haven't picked a name yet. # DevTools bug There is no new DevTool surface for this. The HTML parser already complains in a DevTool-visible way about unlabeled pages, and this change will not remove those messages. # Other browsers Chromium-based browsers: Already shipping feature #1 (not shipping feature #2) IE: Off-by-default (not precisely feature #1 or #2 but a kind of combination of the two). Safari: Not shipping either feature but, like Firefox and unlike Chrome, provides a menu for addressing the use cases that feature #2 is meant to address. # web-platform-tests Since there isn't a spec and Safari doesn't implement the feature, there are no cross-vendor tests. # Secure contexts Since this pair of features is about compatibility with legacy content, both features apply to insecure contexts. # Sandboxed iframes The feature applies to sandboxed iframes. For feature #1, the feature applies only to different-origin frames and the situation is the same as for the pre-existing Japanese detection: The framer cannot turn off the feature for the framee. Both the framer or the framee can turn off the feature for itself by adhering to the HTML authoring conformance requirements, i.e. by declaring its own encoding. For feature #2, the situation is the same as for the pre-existing menu: The top-level page can turn off the feature for the whole hierarchy by using UTF-8, not having any UTF-8 errors, and declaring UTF-8, or, alternatively, by using the UTF-8 BOM (even if there are subsequent errors). The framee can turn off the feature for itself by using the UTF-8 BOM. -- Henri Sivonen hsivo...@mozilla.com _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform