Current rules of recognizing HTML files in Emacs are too strict: 1. The valid string delimiter for HTML attribute values is the quotation character. However, some HTML files on the Web use apostrophes, e.g.
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'> The program that generates such non-standard meta headers is identified as 'Microsoft DHTML Editing Control' (no surprise). `sgml-html-meta-auto-coding-function' can't determine encoding from such invalid meta headers. I propose to replace \" with [\"'] in regexps in `sgml-html-meta-auto-coding-function' to accept such invalid HTML. (The regexps in other function `sgml-xml-auto-coding-function' already match [\"'] for XML files). 2. `sgml-html-meta-auto-coding-function' can't determine encoding when HTML file has no `<html>' starting element. An example of such HTML file is the Mozilla Firefox bookmark file. Sometimes it's needed to open this file in Emacs and to use isearch on it, but Emacs can't detect its encoding. Perhaps the test `(search-forward "<html" size t)' should be removed from `sgml-html-meta-auto-coding-function'. 3. Visiting Mozilla Firefox bookmark file in Emacs also can't detect the type of this file. Emacs opens it in SGML mode whereas it is actually HTML file. This problem is caused by the default value of `magic-mode-alist'. Maybe the `.html' extension in `auto-mode-alist' should take precedence over `magic-mode-alist'? -- Juri Linkov http://www.jurta.org/emacs/ _______________________________________________ Emacs-devel mailing list Emacs-devel@gnu.org http://lists.gnu.org/mailman/listinfo/emacs-devel