Kaixo! On Mon, Oct 11, 2004 at 11:56:09AM -0400, Edward H. Trager wrote: > MANDRAKE: > ========= > And does anyone know the official story about Mandrake? > I installed Mandrake 10.0 (from a magazine disc) and got > an ISO-8859-1 locale instead of a UTF-8 locale.
It depends on your installation choices. In Mandrakelinux there are some locales that are in UTF-8 by default (those languages that can be supported only in UTF-8, or that don't have any large legacy corpus in non-UTF-8); and other locales that do have large legacy in non UTF-8 that are, currently (it will hopefully change sometime in the future) in legacy encoding by default. But there is, under the "advanced" tab, a "use UTF-8" by default checkbox, so you can force UTF-8 in anycase. UTF-8 is also used if you choose several languages and UTF-8 is the only shared encoding (eg, if you choose support for French and German, both with legacy iso-8859-15 by default, UTF-8 won't be enabled (unless you check the UTF-8 checkbox), but if you choose French and Geek for example, as iso-8859-15 and iso-8859-7 are different, then UTF-8 is used). > The Mandrake > locale-setting GUI continued to provide only legacy ISO options, > as far as I could tell. the choice has to be done at install time (as use of utf-8 or not has consequences on how data is stored on hard disks on native linux partitions; it is not 100% automatizable to change it afterward) > In the end I manually set the .i18n > file to en_US.UTF-8 and everything seems to work to the extent that > I have tested it. So why is UTF-8 not the default? Does anyone know? Because people complained. UTF-8 support a year ago was not as good as now, and a lot of people (in particular those using "en_US" locale :) ) would complain about ugly fonts and other problems if UTF-8 was the default. The situation improved a lot, and nowadays there are very few problems left, probably UTF-8 could be made the default soon; and maybe it could have been made the default if there weren't other more important issues to spend our time. > APACHE: One of the remaining problems is the problem of web pages in cp1252 with unanounced encoding, when using utf-8 by default some browsers display them wrong (browsers should do some automatic charset encoding detection to see if the page is in utf-8, or in cp1252 (the two only valid choices for unanounced encoding pages, imho). Same for email programs too (since I switched to utf-8 I got a lot of messages that display wrongly as they are encoded in cp1252 but don't announce it properly (in particular in the subject/from headers; but also in the body); here too, some automatic encoding detection could help a lot. > The last time I installed Apache 2.0.x, it too defaults to the > legacy ISO-8859-1 configuration. One has to manually change the configuration > file in order to get HTML pages served with the correct headers > indicating UTF-8. No, it is to the individual files to announce their encoding, not to the web server. I don't have any problem using apache with html files correctly anouncing their encoding, I use a mix of iso-8859-1/iso-8859-15/utf-8, with some occasional iso-2022-jp pages too. > Does anyone know if this is still the case? When is this going to change? > Apache 2.0.x should really default to UTF-8. Do people agree with me here? I disagree :) The default therefore must not be utf-8 but simply nothing, forcing a single encoding for all the pages of a whole server is something that can only be done by the manager of the server, after carefully thinking about it; not something that should be blindly enforced by default. I however fully agree with you that forcing iso-8859-1 by default is vey wrong; but I think that forcing any encoding by default is wrong. -- Ki ça vos våye bén, Pablo Saratxaga http://chanae.walon.org/pablo/ PGP Key available, key ID: 0xD9B85466 [you can write me in Walloon, Spanish, French, English, Catalan or Esperanto] [min povas skribi en valona, esperanta, angla aux latinidaj lingvoj]
pgphAIifRCq3o.pgp
Description: PGP signature