Kaixo! On Mon, Oct 11, 2004 at 02:00:05PM -0400, ïïïïïïïïï wrote: > I think that HTTP's "Content-Type: text/html" should automatically imply > "Content-Type: text/html; charset=UTF-8", like xml does.
I would *love* it to be the case. But the W3 hasn't standardized on it yet. > But since thats not the > case I configure my apache server to be explicit. (the HTTP setting seems to > override any META tags in individual pages, but I'm fine with that.) Well, if you control the whole site and check that all pages are in the right encoding, it's fine (but then, you should as well put the right line on each file as well). But if you don't control all the content (eg, you have *users*!), or if there is at least one file in a different encoding, then the result is bad. > For the most part, I think that even having a choice of encoding is a source > of problems, and it'd probably be better all around if it simply wasnt > configurable, > outside of conversion programs such as iconv. I think its high time to treat > encoding as a solved problem and move on to the trickier aspects of i18n. You can't simply dismiss the problem and hope it will disappear. It will take a lot of time before utf-8 is used everywhere (well, unicode everywhere, utf-8 everywhere won't be the case, at least as long as some major players aroung will continue prefering utf-16 :) ). For new protocols/formats it is simple to mandate it is utf-8 only, indeed; but there are also old ones, like plain text files, email, usenet, irc, html,... for those a mechanism has been defined, long before utf-8 came to life, in order to specify the encoding of the individual document. A first step would be to lobby to make that specification mandatory and always present, and respected. Nowadays too much html pages, too much news and mail messages, are sent without charset specification, and that is a stopping the larger use of utf-8. A smart auto-detection between cp1252/utf-8 (often the problem is with cp1251 (or iso-8859-1, but it is a subset of cp1252 so it can be treated the same), people using languages not covered by that 8bit encoding are much more likely to adopt utf-8, they also are much more likely to tell the encoding used in their html/mail/news etc, which helps a lot to do a smooth transition); a smart auto-detection so, helps a lot, as it allows the user to be able to read both old (cp1252) and new (utf-8) non-announced encodings, it will be transparent for him, and in practice solve the problem for him, making it a suitable option to switch to utf-8. -- Ki Ãa vos vÃye bÃn, Pablo Saratxaga http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Catalan or Esperanto] [min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]
pgpt2pfEabVDj.pgp
Description: PGP signature