In section 2.7 of HTML 5, it says:

> When comparing a string specifying a character encoding with the name
> or alias of a character encoding to determine if they are equal, user
> agents must use the Charset Alias Matching rules defined in Unicode
> Technical Standard #22. [UTS22]
>
> For instance, "GB_2312-80" and "g.b.2312(80)" are considered equivalent 
> names."

I think this should be removed, since none of the major browsers do
this, and it is too lenient.

The general approach should be: As lenient as the major browsers, but
not more lenient. Lenience leads to a proliferation of garbage.

Of course, the question is what to replace the above text with. There
is a discussion on the [email protected] list about gathering the
current lists of charsets and aliases from the browsers. Hopefully,
that discussion will result in something that can be published in HTML
5.

How about putting a placeholder in the current HTML 5 draft? I
consider UTS22 to be harmful, so it should be removed from HTML 5
ASAP.

Erik

Reply via email to