So you recommend fixing ( adding "smart auto-detection" ) non-broken apps,
and to leave broken ones ( that forget to indicate encoding ) broken ?
Yep, that is a sure way to chaos ...


-----Original Message-----
From: Pablo Saratxaga <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Date: Mon, 11 Oct 2004 22:18:29 +0200
Subject: Re: Debian UTF-8, Mandrake UTF-8, and Apache UTF-8

> Kaixo!
> 
> On Mon, Oct 11, 2004 at 02:00:05PM -0400, jmaiorana
> wrote:
>  
> > I think that HTTP's "Content-Type: text/html" should automatically
> imply
> > "Content-Type: text/html; charset=UTF-8", like xml does.
> 
> I would *love* it to be the case.
> But the W3 hasn't standardized on it yet.
> 
> > But since thats not the
> > case I configure my apache server to be explicit. (the HTTP setting
> seems to
> > override any META tags in individual pages, but I'm fine with that.)
> 
> Well, if you control the whole site and check that all pages are
> in the right encoding, it's fine (but then, you should as well put the
> right line on each file as well).
> 
> But if you don't control all the content (eg, you have *users*!), or if
> there is at least one file in a different encoding, then the result is
> bad.
> 
> > For the most part, I think that even having a choice of encoding is a
> source
> > of problems, and it'd probably be better all around if it simply
> wasnt 
> > configurable,
> > outside of conversion programs such as iconv. I think its high time
> to treat
> > encoding as a solved problem and move on to the trickier aspects of
> i18n.
> 
> You can't simply dismiss the problem and hope it will disappear.
> It will take a lot of time before utf-8 is used everywhere (well,
> unicode everywhere, utf-8 everywhere won't be the case, at least as
> long
> as some major players aroung will continue prefering utf-16 :) ).
> 
> For new protocols/formats it is simple to mandate it is utf-8 only,
> indeed; but there are also old ones, like plain text files, email,
> usenet, irc, html,... for those a mechanism has been defined, long
> before
> utf-8 came to life, in order to specify the encoding of the individual
> document. A first step would be to lobby to make that specification
> mandatory and always present, and respected.
> Nowadays too much html pages, too much news and mail messages, are sent
> without charset specification, and that is a stopping the larger use of
> utf-8.
> 
> A smart auto-detection between cp1252/utf-8 (often the problem is with
> cp1251 (or iso-8859-1, but it is a subset of cp1252 so it can be
> treated
> the same), people using languages not covered by that 8bit encoding are
> much more likely to adopt utf-8, they also are much more likely to tell
> the encoding used in their html/mail/news etc, which helps a lot to do
> a smooth transition);
> a smart auto-detection so, helps a lot, as it allows the user to be
> able to read both old (cp1252) and new (utf-8) non-announced encodings,
> it will be transparent for him, and in practice solve the problem for
> him, making it a suitable option to switch to utf-8.
>   
> -- 
> Ki ça vos våye bén,
> Pablo Saratxaga
> 
> http://chanae.walon.org/pablo/                PGP Key available, key ID: 0xD9B85466
> [you can write me in Walloon, Spanish, French, English, Catalan or
> Esperanto]
> [min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]
> 


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

  • D... xerces8
    • ... Daniel M. Bergey
    • ... Edward H. Trager
      • ... Pablo Saratxaga
        • ... Edward H. Trager
          • ... jmaiorana
            • ... Pablo Saratxaga
          • ... Markus Kuhn
          • ... Keld Jrn Simonsen
    • ... Jan Willem Stumpel

Reply via email to