In section 2.7 of HTML 5, it says: > When comparing a string specifying a character encoding with the name > or alias of a character encoding to determine if they are equal, user > agents must use the Charset Alias Matching rules defined in Unicode > Technical Standard #22. [UTS22] > > For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent > names."
I think this should be removed, since none of the major browsers do this, and it is too lenient. The general approach should be: As lenient as the major browsers, but not more lenient. Lenience leads to a proliferation of garbage. Of course, the question is what to replace the above text with. There is a discussion on the [email protected] list about gathering the current lists of charsets and aliases from the browsers. Hopefully, that discussion will result in something that can be published in HTML 5. How about putting a placeholder in the current HTML 5 draft? I consider UTS22 to be harmful, so it should be removed from HTML 5 ASAP. Erik
