On Tue, Jan 01, 2002 at 08:30:53PM +0000, Phillip Deackes wrote: | On Mon, 31 Dec 2001 12:48:37 -0600 | Colin Watson <[EMAIL PROTECTED]> wrote: | | > It might be worth having a look at the Euro-Char-Support mini-HOWTO | > (/usr/share/doc/HOWTO/en-html/mini/Euro-Char-Support/index.html if you | > have doc-linux-html installed, or somewhere on | > http://www.linuxdoc.org/). I'm not sure if it's good enough either - it | > was put together quite recently - but it does at least have the virtue | > of being concise. | | Thanks for your help, Colin and Sean. | | The document you mention is the problem. It does not help me at all. For | instance, it says that automatic configuration is possible with the | 'lang-env' package. I cannot find any package with a name like 'lang-env'. | A search of Debian packages yields nothing.
It is 'language-env'. According to 'apt-cache policy' I don't think it is in potato. (potato is _really_ _really_ old) | Furthermore, paragraphs such as this are pretty incomprehensible to me: | | "Programs use the localisation environment in order to know both the | language and the charset being used. Currently there is no separation, | unless you are using UTF-8 from locale and representation. Environment | locales use both the language for example: | | es_ES.ISO-8859-1 | en_US.utf" I believe this line is wrong, but I don't know for sure. I'm using "en_US.UTF-8". | What is .utf? Universal Transformation Format It is a fancy name for describing a method to store multibyte characters (unicode) in a file (where bytes are the only data type). | Why are there certain files containing 'euro' but not others. Why are there certain files containing "g" but not others? The euro is just another character. | Why might I be inetersted in installing a Spanish language file. If you use spanish. | I did enable the French Euro file, since I speak French, but this | does not appear to help. Do I need to enable it? What is the "French Euro file"? | I see no need to understand localisation issues. I want to be able to | choose my language/keyboard and do little more. Choosing your language _is_ localisation! | I appreciate that adding the Euro symbol is not as simple as it | sounds, but somebody who knows how to do it should be able to write | a step-by-step crib sheet so that other can get it working on their | systems. As I mentioned above, the euro is just another character. Unfortunately, people need/want more than the 127 characters in the US-ASCII character set (aka charset). The euro, for example, is not part of US-ASCII. Since there are 127 additional values not taken by US-ASCII, some ISO committee(s) have created additional charsets to add some characters. These charsets are supersets of US-ASCII (that is, the first 127 characters are identical to US-ASCII) and the remaining 127 characters are characters useful to a given region (locale). ISO-8859-1 contains many umlaut characters that are common in Western European languages. If you set your locale to ISO8859-1 then you can store those umlauts in plain text files and share them with other people who are also using ISO8859-1. (think of a charset as a text/file format. jpg and png both store images, but in different formats) Likewise ISO8859-2 has characters that are found in Eastern European languages. The advantage to these encodings is that they are all single-byte (char == 8 bits == 1 byte). This means that existing programs can deal with them more-or-less reasonably, even if they don't understand the locale. The problem is that if you deal with multiple languages (eg, French and Romanian) on a regular basis, not only is it a PITA to keep adjusting settings, but you can't put charcters from both encodings into the same file. Thus Unicode was developed. It is a 16-bit (I think it is really 32-bit, but only the lower 16 bits have characters specified) character set that can represent the alphabet of most languages simultaneously. Unicode presents a problem though -- each unicode character requires at least 2 bytes in memory, but the C 'char' type is only guaranteed to be 1 byte. In addition, it is not wholly backwards compatible with US-ASCII. The problem is that applications must be developed with this in mind so that they can handle it properly. Various encodings of Unicode have been developed to store unicode characters in files. UTF-8 is the most well-known, and it is backwards compatible with US-ASCII (for the US-ASCII subset of Unicode). Thus if you use UTF-8 and stick to just the US-ASCII subset where the additional characters are not needed or not understood there is no problem. Now, how does all this relate to the euro? Well, the euro is not part of the US-ASCII charset. Nor is it part of ISO8859-1. However it is part of Unicode (character 0x20AC). To make use of the euro you must use the Unicode charset and choose one of its encodings (UTF-8) for storing files. Now read the Unicode HOWTO for some more information on Unicode. That HOWTO has some information, and some of it is dated, but it helps to understand what must be configured where and what doesn't work in using characters that aren't part of ISO-8859-1. To try and put it simply : you need to o install the X fonts to display Unicode characters (unfortunately GTK+ 1.2 doesn't handle multibyte fonts correctly so most GTK+ apps won't handle unicode correctly, gvim is an exception) o configure the console to display unicode (if you use the console, I don't know how to configure it yet) o specify the encoding in your locale, eg : export LANG=en_US.UTF-8 o test the programs that you use and see which ones work with unicode and how to work with it don't put the euro in a file that will be read by a program that doesn't understand unicode, in vim you can create mappings such that when you enter a certain set of characters it inserts something else or you can enter any unicode character with ^VuXXXX where ^V is control-v and XXXX is the character's value (in hexadecimal) The euro character itself is not special any more than all other non-US-ASCII characters are. The problem is a bigger one -- developers have long followed the tradition that a char is one byte and follows the ASCII encoding. Support for unicode has lagged considerably and it (any major change like that) causes many problems with program interaction (if one program supports unicode while the other doesn't or only partially supports it). -D -- He who walks with the wise grows wise, but a companion of fools suffers harm. Proverbs 13:20