Kaixo!

On Mon, Oct 11, 2004 at 02:00:05PM -0400, ïïïïïïïïï wrote:
 
> I think that HTTP's "Content-Type: text/html" should automatically imply
> "Content-Type: text/html; charset=UTF-8", like xml does.

I would *love* it to be the case.
But the W3 hasn't standardized on it yet.

> But since thats not the
> case I configure my apache server to be explicit. (the HTTP setting seems to
> override any META tags in individual pages, but I'm fine with that.)

Well, if you control the whole site and check that all pages are
in the right encoding, it's fine (but then, you should as well put the
right line on each file as well).

But if you don't control all the content (eg, you have *users*!), or if
there is at least one file in a different encoding, then the result is
bad.

> For the most part, I think that even having a choice of encoding is a source
> of problems, and it'd probably be better all around if it simply wasnt 
> configurable,
> outside of conversion programs such as iconv. I think its high time to treat
> encoding as a solved problem and move on to the trickier aspects of i18n.

You can't simply dismiss the problem and hope it will disappear.
It will take a lot of time before utf-8 is used everywhere (well,
unicode everywhere, utf-8 everywhere won't be the case, at least as long
as some major players aroung will continue prefering utf-16 :) ).

For new protocols/formats it is simple to mandate it is utf-8 only,
indeed; but there are also old ones, like plain text files, email,
usenet, irc, html,... for those a mechanism has been defined, long before
utf-8 came to life, in order to specify the encoding of the individual
document. A first step would be to lobby to make that specification
mandatory and always present, and respected.
Nowadays too much html pages, too much news and mail messages, are sent
without charset specification, and that is a stopping the larger use of
utf-8.

A smart auto-detection between cp1252/utf-8 (often the problem is with
cp1251 (or iso-8859-1, but it is a subset of cp1252 so it can be treated
the same), people using languages not covered by that 8bit encoding are
much more likely to adopt utf-8, they also are much more likely to tell
the encoding used in their html/mail/news etc, which helps a lot to do
a smooth transition);
a smart auto-detection so, helps a lot, as it allows the user to be
able to read both old (cp1252) and new (utf-8) non-announced encodings,
it will be transparent for him, and in practice solve the problem for
him, making it a suitable option to switch to utf-8.
  
-- 
Ki Ãa vos vÃye bÃn,
Pablo Saratxaga

http://chanae.walon.org/pablo/          PGP Key available, key ID: 0xD9B85466
[you can write me in Walloon, Spanish, French, English, Catalan or Esperanto]
[min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]

Attachment: pgpt2pfEabVDj.pgp
Description: PGP signature

  • D... xerces8
    • ... Daniel M. Bergey
    • ... Edward H. Trager
      • ... Pablo Saratxaga
        • ... Edward H. Trager
          • ... jmaiorana
            • ... Pablo Saratxaga
          • ... Markus Kuhn
          • ... Keld Jørn Simonsen
    • ... Jan Willem Stumpel

Reply via email to