Hello Vladimir, There are a few things that I hope to write in front portion of this email since I believe they deserve some better visibility. I also inlined my answers in the email at below for all the questions you have.
Please be noted first that this alias support mechanism isn't providing external files for the two sets of the alias mappings as clearly indicated by the section 5 in the locale_alias(5) man page name and the spec. Second, you mention how this will help a customer who took his script from a HP-UX box to Solaris box and suddenly its output isn't in French but in English. There are various reasons why the output could be in English. I am sure you understand that I'm not providing a panacea for all possible kinds of problems involved with that question since providing the panacea for such unknown problems is simply not really and practically possible. However, if the script is a properly internationalized one that uses publically available interfaces and assumes and uses HP-UX locale names in it, then, the script will work fine in the Solaris box as if the HP-UX locale names are supported (as long as the Solaris box has corresponding canonical locale and localization in it). Better examples for the project are something like the following: 1. In an IBM shop, customers have "setenv LANG CS_CZ" for Czech Czech Republic UTF-8 locale environment, say, in their ~/.cshrc file. They now have a Solaris box and did "ssh" to the Solaris box which will then forward the CS_CZ locale environment variable settings to the Solaris box. Without the locale alias support, you will see C locale and English messages even though the Solaris box has cs_CZ.UTF-8 locale. With the locale alias support, on the other hand, you will see Czech messages as if CS_CZ locale is supported in Solaris even though the guts are actually coming from the cs_CZ.UTF-8 locales. 2. In an HP/Linux shop, customers have "setenv LANG ja_JP.SJIS" for Japanese Japan Shift_JIS locale environment in their ~/.cshrc file which is accessed through NFS home directory. Now they have Solaris boxes and if they sit in front of a Solaris workstation and log in to it, then, without the locale alias support, their locale environment settings at ~/.cshrc will not be honored and they will fallback to C locale with English messages. With the locale alias support, on the other hand, they will see Japanese locale and messages since the locale alias support will map the ja_JP.SJIS internally to ja_JP.PCK at Solaris boxes. We G11N consider this project important for our business and user conveniences and help people to migrate to ours. All others, I inlined in the email at below the answers. Ienup Vladimir Marek wrote at 11/02/09 08:34: > I just wonder, what's wrong (messy?) about thousands of symbolic links > in directory? I might be horribly naive, but I thought that managing > files (links) is quite easy task for any packaging system. Moreover the > 'managing' will be most probably just adding the missing links, no > deleting or renaming (exceptions proves the rule). As an example, soon enough, we will have to update the locale shared objects, once again, to incorporate the new Austin standard revision and at that time, it is inevitable for us to up the shared objects' version from the current <locale>.so.3 and methods_<locale>.so.3 to, say, *.so.4. That means we will not only have to update the canonical locales but also have to ensure that all of those so many thousand symbolic links are also correctly changed if we were to support the aliases with symlinks. (I believe the number of symlinks will be somewhere between a half million to a bit more than 1 million.) That's going to be and clearly a maintenance hazard and overhead and I don't think we have enough bandwidth and resource to handle that kind of changes, do you? > The downsizes of such solution I see: > > a) unintuitive > If my locale does not work and similar are installed (consulting locale > -a), first I would look into locale directory to see what's there. > Creating just another symlink is next logical step. It is precisely the reason why there is locale_alias(5) man page and locale(1) man page update and so on so that we document and give info to people instead of let people to look into system directories which isn't really an official interface to regular users for that matter. (I'm a geek for sure to know the details and will look into system directories and what not since I was responsible for LC_COLLATE category in SunOS and wrote the guts when we re-architectured it to be codeset independent several years ago but I don't plan to ask people to look at /usr/lib/locale/ directory for them to know the available locales at the current system; that's the last resort thing that I would recommend regular people to do.) Re the "creating just another symlinks", if you wish, you (i.e., customers) can always do that even after the project integration. > b1) harder to manage > If we have separate packages for each locale, the shared locale_alias > would have to be dynamically modified with each package install and > remove. The locale_alias isn't a file and will not be. It is a section 5 man page merely to give miscellaneous information and in this case locale alias details including the lists of the supported locale aliases. Hence there is no change from packaging perspective. > b2) harder to manage > if there is single locale_alias file, and user makes modifications, we > have to merge his change with our own patches (or how it is called > these days). Creating symlink does not interfere with other packages. Again, no, there isn't locale_alias file as I explicitly mentioned about the tables and details at the spec and also locale_alias(5) man page. > c) slow > If user has LC_ALL=blah, then each subsequent setlocale(3C) has to go > through the locale_alias and compare thousands of lines to blah. > Setlocale is frequently called command ... (You could cache this > blah=cs_CZ somewhere. But then there is need for tools to purge this > cache and user has to find out how to do it when he changes the > locale_alias). There is no change what so ever if you're using canonical locale names as you're supposed to do so since this alias checking is the last resort step within the setlocale(3C) and even with, say LC_ALL=blah like erroneous settings, the search to find if it is a supported alias will be extremely fast with only a few comparisons of characters in locale names with a modified indexed trie search data structure. Also, to be more precise on what's on the stake and matters, this is about whether to fail for acceptable locale aliases and fall back to C locale *or* to have a few comparisons of characters and find out if it is a supported locale alias or not (and only when the locale name supplied isn't supported as it is) and, if so, support the locale alias to have a better compatibility with other competing platforms *not* on the performance degradation on erroneous locale setting cases. >> Hence, this project proposes to have a transparent locale name alias support >> mechanism at libc with embedded locale name mapping tables as outlined at >> below to remedy the interoperability/compatibility issue and aid users >> who want to migrate from other platforms to Solaris. > > Linux uses locale.alias (Debian has /etc/locale.alias and > /usr/X11R6/lib/X11/locale/locale.alias for example). To aid Linux users, > it might be good idea to create locale.alias(4) man page. We already have X11 locale.alias file and we have been having it for almost two decades now and so that's not a topic to discuss. The Debian /etc/locale.alias is a way to have an alias support. I don't think that is really necessary for us as other Linux distros are not supporting it and considering that we will have locale_alias embedded into libc. I'll also elaborate on this with replies to Nicolas's and Alan's emails. >> TECHNICAL DETAILS >> >> Currently, when a locale selection is made with setlocale(3C), as an example >> for 32-bit environment, the function looks for the locale shared object at >> /usr/lib/locale/<locale>/<locale>.so.3. In this process of locating the >> locale >> shared object, the <locale> name given to the setlocale(3C) and the <locale> >> component of the path to the locale shared object must be identical byte by >> byte. > > That is problem for symlinks. Still would not it be easier to relax this > scheme a bit (file locale.so.3 would work for all locales [pardon my > naivity again ...]). Please elaborate what did you mean by the above, "file locale.so.3 would work for all locales" and also "That is problem of symlinks." The reason why I'm asking this is that it appears that you're proposing a single locale shared object for all locales?? >> The mapping tables shown at locale_alias(5) [2] are formulated from the >> data extracted from [3], [4], and some operating systems such as AIX 6.1, >> HP-UX 11.11, RHEL 5.4, Ubuntu 9.04, and the latest OpenSolaris/Solaris Nevada >> via some simple reverse engineering. They will be embedded into libc under >> read only data section. (We expect there will be no significant changes at >> the tables, if any, in the future.) > > Oh, this is not file, but rather ELF section? I don't see the aid for > the user if he has to file escalation (pay support first) and wait some > time to get updated packages from upstream, just to make his cs_CZ.UtF-8 > work? I think you thought locale_alias(5) is a file that this project will supply? And then as you are reading the spec, now you found that it isn't? Please elaborate if that's case re locale_alias(5). Re the cs_CZ.UtF-8 locale name, as I specified in the locale_alias(5) man page, there is a codeset part normalization in this enhancement project and thus cs_CZ.UtF-8, cs_CZ.UTF8, cs_CZ.utf8, cs_CZ.utf-8, and so on will all be accepted as cs_CZ.UTF-8 aliases as a by product of the codeset part normalization. Please read locale_alias(5) man page. >> Although this project does not change locale(1) utility, this project also >> update the NOTES section of locale(1) man page as shown at [2] to clarify on >> the "locale -a" output that locale aliases are supported only as aliases and >> will not be shown at the output. > > > Just out of interest, what is the difference between canonical locale > name, and 'additional locale name'? Why do we have to keep them > separated? We would not need lawyer-like documentation like > http://sac.sfbay/PSARC/2009/594/materials/setlocale.3c.diff You may not need explicit specification but such specification is needed to make our features clearly documented and described. Under documenting, ambiguous documentation, or lack of it should be considered harmful not the other way. I also think that what I proposed are not over-specifying things. The canonical locale names are, literally, locale names of established principles. The locale name aliases are, literally, aliases to canonical locale names. Re the "additional locale names", as described in the catopen(3C) and gettext(1), they literally refer to locale names that the gettext(1) and so on are going to additionally check on to find its message catalog: If lang specified is a canonical locale name to obsoleted Solaris | locale names as described in locale_alias(5) and the above mentioned | ordinary locations with lang do not yield a message object, for | a better backward compatibility, gettext additionally looks for | its message object using the obsoleted Solaris locale names as | the additional locale names to check on with in place of lang. | What part of it do you find not clear? > Maybe I just don't see the problem. Customer takes his script form > HP-UX, and suddenly it's output is not in French, but rather in English. > How exactly will help him this quite complex change? Please see the answer at the beginning of this email. > > > Thank you for your patience with me >