Re: [gentoo-user] Glibc, userlocales, and ENV Variables
Hi, On Wed, 02 Nov 2005 15:53:11 +0100 Holly Bostick [EMAIL PROTECTED] wrote: [...] /etc/locales.build which says # This file names the list of locales to be built when glibc is installed. # The format is locale/charmap, where locale is a locale from the # /usr/share/i18n/locales directory, and charmap is name of one of the files # in /usr/share/i18n/charmaps/. All blank lines and lines starting with # are # ignored. Here is an example: # en_US/ISO-8859-1 [...] Glibc built fine (afaict), but my problem is that I now don't know what to export with a LANG variable. For example, if I want [EMAIL PROTECTED]/UTF-8, how do I export that as opposed to [EMAIL PROTECTED]/ISO-8859-15 (or worse, ISO-8859-1)? Note the comment you've cited: The format is locale/charmap. This generates the locale data for a certain language (it's a little bit more than just language, though) for the specified charmap. In LANG/LC_* you only set the locale. The charmap is (semi-) automatically chosen, which makes sense, since it's terminal dependant which charset is used. Was I supposed to give the locales individual names as the Localization Guide implies? locales.build doesn't indicate that you can do that (and in fact, I thought perhaps the reason why language exports were mildly borked might be because I had done so). [EMAIL PROTECTED]/ISO-8859-1 didn't make much sense to me (and maybe causes some failures when building?), but other from that it seemed OK. Should I just get rid of the 'extra' locales (ISO-8859-15 and ISO-8859-1)? Since I guess I'm going to try to stick to UTF-8, maybe I don't really need them (I was mostly covering my butt, concerned that my current and future network connections might not support UTF-8, since they're mostly to Windows machines). All the terminals you're using support UTF-8? I guess I've made a mistake, but I'm not quite sure what to do about it. Since fixing it will most almost certainly require a recompile of glibc, and since compiling glibc takes nine-tenths of forever, I'd like to get it on with it as soon as possible (sigh). So any hints would be appreciated. How does the borkism of your locales manifest? -hwh -- gentoo-user@gentoo.org mailing list
Re: [gentoo-user] Glibc, userlocales, and ENV Variables
Hans-Werner Hilse schreef: Hi, On Wed, 02 Nov 2005 15:53:11 +0100 Holly Bostick [EMAIL PROTECTED] wrote: [...] /etc/locales.build which says # This file names the list of locales to be built when glibc is installed. # The format is locale/charmap, where locale is a locale from the # /usr/share/i18n/locales directory, and charmap is name of one of the files # in /usr/share/i18n/charmaps/. All blank lines and lines starting with # are # ignored. Here is an example: # en_US/ISO-8859-1 [...] Glibc built fine (afaict), but my problem is that I now don't know what to export with a LANG variable. For example, if I want [EMAIL PROTECTED]/UTF-8, how do I export that as opposed to [EMAIL PROTECTED]/ISO-8859-15 (or worse, ISO-8859-1)? Note the comment you've cited: The format is locale/charmap. This generates the locale data for a certain language (it's a little bit more than just language, though) for the specified charmap. In LANG/LC_* you only set the locale. The charmap is (semi-) automatically chosen, which makes sense, since it's terminal dependant which charset is used. OK, I kinda get that and dmesg says during boot that the terminal (agetty) is being configured to use UTF-8 (which is what I told it to do when I built the kernel, so that's OK). So does that mean that when I log in to my DE/WM, and start X, the charmap will be automatically UTF-8, because that's what the getty was? I want the full ISO-8859-15 charset and the Euro symbol. UTF-8 gets me the charset, but afaik I need some attachment to @euro to get the Euro symbol (for those fonts that even have the character(s), which is another horror show that I won't get into, since once you've found a reasonably attractive font with all the characters, half the time it doesn't have bold or italic or bold italic, so it's not very useful on the desktop a horror show). It's not clear to me whether the Euro symbol is included in UTF-8 encodings, or only as a special variant of ISO-8859-15 (the @euro variant), which is one of the reasons I try to encode both. Was I supposed to give the locales individual names as the Localization Guide implies? locales.build doesn't indicate that you can do that (and in fact, I thought perhaps the reason why language exports were mildly borked might be because I had done so). [EMAIL PROTECTED]/ISO-8859-1 didn't make much sense to me (and maybe causes some failures when building?), but other from that it seemed OK. Well, of course I know less about this than you do, but my native Dutch boyfriend runs a English Windows machine, I run Windows programs with Wine, and about the only thing I think I know about the whole issue is that Windows pretty much only knows ISO-8859-1 (unless you had a multi-lingual version, which neither of us did). So I wanted support for ISO-8859-1 to be available (with support for the Euro symbol for those MS fonts that support it, which I think that the core MS fonts now do by default, though I'm not sure about that either). In any case, if such an application called for ISO-8859-1 , I wanted it to be there, though as you can tell, I don't get how this is all supposed to work well enough to be sure that was the way to accomplish the goal. Should I just get rid of the 'extra' locales (ISO-8859-15 and ISO-8859-1)? Since I guess I'm going to try to stick to UTF-8, maybe I don't really need them (I was mostly covering my butt, concerned that my current and future network connections might not support UTF-8, since they're mostly to Windows machines). All the terminals you're using support UTF-8? Well, I thought so, but maybe I was wrong. I use mostly multi-gnome-terminal (which does appear to have unicode support by default), but when I switched window managers to fvwm-crystal, I started using mrxvt and aterm a bit more (because fvwm-crystal likes them, and xterm-- which crystal also likes-- takes forever to open for some reason, likely unrelated but very annoying). This may well be when I started noticing this as a problem rather than an annoyance, because I was suddenly seeing it so much. Previously, the issue had only raised its ugly head in some X programs, but not X programs I use that often, so it was easy to ignore. None of the terms I use have a unicode USE flag, but I have been by the homepages. Now I see that support for CJK does not mean that UTF is automatically supported; it seems that mrvxt does not support unicode, nor do aterm/multi-aterm/rvxt. OK, that answers that, I guess, but what did you Europeans do when these terminals were all you had, for Pete's sake? Your output would have been half-gibberish, and I don't see how people would have stood for that. I guess I've made a mistake, but I'm not quite sure what to do about it. Since fixing it will most almost certainly require a recompile of glibc, and since compiling glibc takes nine-tenths of forever, I'd like to get it on
Re: [gentoo-user] Glibc, userlocales, and ENV Variables
Hi, On Wed, 02 Nov 2005 21:16:49 +0100 Holly Bostick [EMAIL PROTECTED] wrote: OK, I kinda get that and dmesg says during boot that the terminal (agetty) is being configured to use UTF-8 (which is what I told it to do when I built the kernel, so that's OK). The kernel is configured by the gentoo rc system using unicode_start. This sets console charmap font. So does that mean that when I log in to my DE/WM, and start X, the charmap will be automatically UTF-8, because that's what the getty was? No, that's independent. Your X terminal program talks to X and uses its font subsystem. That also uses charset information for finding correct fonts. On the other hand there are other means to use fonts now and some solutions do an intermediate mapping to unicode. It's not clear to me whether the Euro symbol is included in UTF-8 encodings, or only as a special variant of ISO-8859-15 (the @euro variant), which is one of the reasons I try to encode both. it's both in UTF-8 (which includes every sign under the sun - almost) and ISO-8859-15 which is the same as latin9 (hint: look for this when searching console fonts!) which is a slightly modified ISO-8859-1 a.k.a. latin1. [EMAIL PROTECTED]/ISO-8859-1 didn't make much sense to me (and maybe causes some failures when building?), but other from that it seemed OK. Well, of course I know less about this than you do [... oooh, I know this what I am writing here since only a few minutes, not longer. In fact, the whole locale setup is terribly bad documented. ...] , but my native Dutch boyfriend runs a English Windows machine, I run Windows programs with Wine, and about the only thing I think I know about the whole issue is that Windows pretty much only knows ISO-8859-1 (unless you had a multi-lingual version, which neither of us did). So I wanted support for ISO-8859-1 to be available (with support for the Euro symbol for those MS fonts that support it, which I think that the core MS fonts now do by default, though I'm not sure about that either). Windows uses Unicode since 2000 (or even NT?). However, that doesn't mean it's shipping fonts with the full Unicode charset available. In any case, if such an application called for ISO-8859-1 , I wanted it to be there, though as you can tell, I don't get how this is all supposed to work well enough to be sure that was the way to accomplish the goal. Very interesting this whole stuff. Actually, I'm just reading my way through the glibc sources as I'd always been interested in this. And it is _very_ bad documented. I've mentioned this In fact, there's no difference between the nl_NL and the [EMAIL PROTECTED] locale. I think probably all of the @euro locales are more or less obsolete now. I think they're a remainder from the time when the new currency was introduced and the user had to choose. Now, the @euro locales de facto just import there [EMAIL PROTECTED] counterparts. This is written in the relevant changelog: * locales/br_FR: Eliminate old national currencies of countries participating in Euro. Make @euro files pure copies. (continues for all @euro) A .UTF-8 locale doesn't exist in glibc's locale database so it must get stripped when the locale is generated. To give you a hint, the default locales generated for language tag nl, subtag NL are nl_NL [EMAIL PROTECTED] nl_NL.UTF8 in your locales.build, this should read nl_NL/ISO-8859-1 [EMAIL PROTECTED]/ISO-8859-15 nl_NL.UTF-8/UTF-8 to have the proper locale for each encoding you/your terminal may come across. Although @euro is a copy, it is needed to identify and distinguish each of the generated locales. After all, nl_NL... is just a name, could have been another name, too. But the LANG setting is also used by gettext (which has nothing to do with this all, but has the same author), AFAIK, and thus shouldn't be totally arbitrary chosen. Your LANG setting should be [EMAIL PROTECTED] for non-unicode environments (given that you're using latin9/ISO-8859-15 fonts) and nl_NL.UTF-8 in unicode environments. How does the borkism of your locales manifest? Most of the time, when Dutch characters are meant to be used, they are, as in the following example: [EMAIL PROTECTED] - killall -9 conky conky: geen proces beëindigd but sometimes I get this: killall -9 MPlayer MPlayer: geen proces beëindigd . now *that's* interesting... I copied and pasted the second from a terminal (mrvxt, whereas the first was from multi-gnome-terminal), where what appeared was killall -9 MPlayer MPlayer: geen proces beA(with the ~ over it) (but tiny ones)indigd in place of the ë . But when I pasted it into this compose window, it came out right! But it isn't in the term. This is probably due to different clipboard implementations. GTK has its own clipboard which imports things from the X clipboard facility that mrxvt is using. Probably GTK applies some logic like recognizing multibyte sequences and decides that