So you recommend fixing ( adding "smart auto-detection" ) non-broken apps, and to leave broken ones ( that forget to indicate encoding ) broken ? Yep, that is a sure way to chaos ...
-----Original Message----- From: Pablo Saratxaga <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Date: Mon, 11 Oct 2004 22:18:29 +0200 Subject: Re: Debian UTF-8, Mandrake UTF-8, and Apache UTF-8 > Kaixo! > > On Mon, Oct 11, 2004 at 02:00:05PM -0400, jmaiorana > wrote: > > > I think that HTTP's "Content-Type: text/html" should automatically > imply > > "Content-Type: text/html; charset=UTF-8", like xml does. > > I would *love* it to be the case. > But the W3 hasn't standardized on it yet. > > > But since thats not the > > case I configure my apache server to be explicit. (the HTTP setting > seems to > > override any META tags in individual pages, but I'm fine with that.) > > Well, if you control the whole site and check that all pages are > in the right encoding, it's fine (but then, you should as well put the > right line on each file as well). > > But if you don't control all the content (eg, you have *users*!), or if > there is at least one file in a different encoding, then the result is > bad. > > > For the most part, I think that even having a choice of encoding is a > source > > of problems, and it'd probably be better all around if it simply > wasnt > > configurable, > > outside of conversion programs such as iconv. I think its high time > to treat > > encoding as a solved problem and move on to the trickier aspects of > i18n. > > You can't simply dismiss the problem and hope it will disappear. > It will take a lot of time before utf-8 is used everywhere (well, > unicode everywhere, utf-8 everywhere won't be the case, at least as > long > as some major players aroung will continue prefering utf-16 :) ). > > For new protocols/formats it is simple to mandate it is utf-8 only, > indeed; but there are also old ones, like plain text files, email, > usenet, irc, html,... for those a mechanism has been defined, long > before > utf-8 came to life, in order to specify the encoding of the individual > document. A first step would be to lobby to make that specification > mandatory and always present, and respected. > Nowadays too much html pages, too much news and mail messages, are sent > without charset specification, and that is a stopping the larger use of > utf-8. > > A smart auto-detection between cp1252/utf-8 (often the problem is with > cp1251 (or iso-8859-1, but it is a subset of cp1252 so it can be > treated > the same), people using languages not covered by that 8bit encoding are > much more likely to adopt utf-8, they also are much more likely to tell > the encoding used in their html/mail/news etc, which helps a lot to do > a smooth transition); > a smart auto-detection so, helps a lot, as it allows the user to be > able to read both old (cp1252) and new (utf-8) non-announced encodings, > it will be transparent for him, and in practice solve the problem for > him, making it a suitable option to switch to utf-8. > > -- > Ki ça vos våye bén, > Pablo Saratxaga > > http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466 > [you can write me in Walloon, Spanish, French, English, Catalan or > Esperanto] > [min povas skribi en valona, esperanta, angla aux latinidaj lingvoj] > -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/