Henri Sivonen wrote:
If a meta element whose http-equiv attribute has the value "Content-Type" (compare case-insensitively) and whose content attribute has a value that begins with "text/html; charset=", the string in the content attribute following the start "text/html; charset=" is taken, white space removed from the sides and considered the tentative encoding name.

This will need to handle common mistakes such as the following:

<meta ... content="application/xhtml+xml;charset=X">
<meta ... content="foo/bar;charset=X">
<meta ... content="foo/bar;charset='X'">
<meta ... content="charset=X">
<meta ... charset="X">

I'm not sure which browsers support each one, they'll all need to be tested.

Authors are adviced not to use the UTF-32 encoding or legacy encodings. (Note: I think UTF-32 on the Web is harmful and utterly pointless,

I agree about it being pointless, but why is it considered harmful?

 I'd like to have some text in the spec that justifies whining
about legacy encodings.

What are your reasons for whining about legacy encodings and what would you like the spec to say?

Also, the spec should probably give guidance on what encodings need to be supported. That set should include at least UTF-8, US-ASCII, ISO-8859-1 and Windows-1252.

And probably UTF-16 as well.

--
Lachlan Hunt
http://lachy.id.au/

Reply via email to