Re: Encoding interaction of HTTP response header and META tag

Jukka K. Korpela Thu, 03 Mar 2011 13:15:18 -0800

Wayne Pollock wrote:

If document authors
goes to the trouble of stating the charset in the HEAD of their
document,
that that should override any default set by the web sever.

I have much sympathy for the idea, for reasons you gave, especially thereason that web server admins often disallow the effects of .htaccess files,effectively enforcing their settings on every authors.

However, I'm afraid it's too late; the change would break a long traditionand would break existing pages.

It is a huge
burden to webmasters everywhere to have to manually set the charset
for every update to their website.

I can't see what you mean by that. The settings need to be checked when youstart creating a site, not after every update.

TO OVERRIDE THE DEFAULT CHARSET RETURNED BY APACHE, A PER FILE
DIRECTIVE MUST
BE USED TO SPECIFY EACH FILE'S CHARSET.

Pardon? Apache settings operate per filename extension, and mostly itsuffices to set the encoding for just one extension, ".html".

Such overriding is possible but
to allow web authors the ability to do so, per directory settings
must be enabled (the ".htaccess" files). doing so severely impactsserver
performance


I don't think it has any significant impact on performance.

and many sites simply can't do so, so web pages WILL be send with the
wrong charset.

Well, I would put it this way: If the server admin disallows the effects ofyour .htaccess file, then it's just something you need to live with it. Ifthe force your HTML documents to be served with headers saying that theencoding is iso-8859-1, or utf-8, or whatever, then just make it so

This should be a simple fix.  The issue was raised on the WHATWG list
and elsewhere, and noboby could think of an objection to this
proposal.


I think a more specific citation of previous discussions would be needed.

(The
only web pages that could "break" with this change were already
broken.)

99% of web pages are broken, in the sense of not complying with HTML, CSS,WCAG 1.0, or other relevant recommendations. When we worry about whathappens to existing pages, we need to worry about more or less broken pages,mostly.

Consider a page on a server that forces Content-Type: text/html;charset=utf-8 on all HTML files. Such servers are increasingly common.Authors have had to accommodate to that, for example saving documents inutf-8 encoding if needed. The pages may well have <meta> tags announcingiso-8859-1 or something else, maybe because some web page editing softwareemitted it, or it belonged to a sample file used as a starting point, or theauthor copied it from somewhere, with little or no understanding of itseffect.

Your proposal, if accepted and implemented, would imply that all such pagesstopped working, if they (literally) contain any character outside the ASCIIrange. This might mean a mess that everyone can see, or just one charactermight be wrong, or anything between these extremes.

On a related note, the new structural tags that denote articles and
such
should allow an optional CHARSET attribute.  A web page with ARTICLEs
etc. may be (and may likely be) composed of content from many sources,
e.g., a "mash-up".  While CMS and blogging software could force a
single
charset so there is only one charset per web page,  that seems an
unnecessary restriction (and I don't know that most blogging software
works that way).

No, it's an inherent restriction. The idea of allowing different characterencodings within a single document has often been suggested, but it's basedon a misunderstanding. Changing the encoding at a higher protocol levelconflicts with the basic modern model of using character data. Recognizingencoding from meta tags is admittedly in conflict with it, too, but it was amore or less unavoidable exception, which has been separately defined (andis still known to cause problems, especially when people don't understandhow it works and place it too late in the document). - "Mash-up" simplyneeds to recode when needed.

--

Yucca, http://www.cs.tut.fi/~jkorpela/

Re: Encoding interaction of HTTP response header and META tag

Reply via email to