Hello Vladimir,

There are a few things that I hope to write in front portion of this email
since I believe they deserve some better visibility. I also inlined my
answers in the email at below for all the questions you have.

Please be noted first that this alias support mechanism isn't providing
external files for the two sets of the alias mappings as clearly indicated
by the section 5 in the locale_alias(5) man page name and the spec.

Second, you mention how this will help a customer who took his script from
a HP-UX box to Solaris box and suddenly its output isn't in French but in
English.

There are various reasons why the output could be in English. I am sure
you understand that I'm not providing a panacea for all possible kinds of
problems involved with that question since providing the panacea for such
unknown problems is simply not really and practically possible.

However, if the script is a properly internationalized one that uses
publically available interfaces and assumes and uses HP-UX locale names in
it, then, the script will work fine in the Solaris box as if the HP-UX
locale names are supported (as long as the Solaris box has corresponding
canonical locale and localization in it).

Better examples for the project are something like the following:

1. In an IBM shop, customers have "setenv LANG CS_CZ" for Czech Czech
Republic UTF-8 locale environment, say, in their ~/.cshrc file. They now
have a Solaris box and did "ssh" to the Solaris box which will then
forward the CS_CZ locale environment variable settings to the Solaris box.
Without the locale alias support, you will see C locale and English
messages even though the Solaris box has cs_CZ.UTF-8 locale. With the locale
alias support, on the other hand, you will see Czech messages as if
CS_CZ locale is supported in Solaris even though the guts are actually
coming from the cs_CZ.UTF-8 locales.

2. In an HP/Linux shop, customers have "setenv LANG ja_JP.SJIS" for
Japanese Japan Shift_JIS locale environment in their ~/.cshrc file
which is accessed through NFS home directory. Now they have Solaris boxes
and if they sit in front of a Solaris workstation and log in to it, then,
without the locale alias support, their locale environment settings at
~/.cshrc will not be honored and they will fallback to C locale with English
messages. With the locale alias support, on the other hand, they will see
Japanese locale and messages since the locale alias support will map
the  ja_JP.SJIS internally to ja_JP.PCK at Solaris boxes.

We G11N consider this project important for our business and user
conveniences and help people to migrate to ours.

All others, I inlined in the email at below the answers.

Ienup

Vladimir Marek wrote at 11/02/09 08:34:
> I just wonder, what's wrong (messy?) about thousands of symbolic links
> in directory? I might be horribly naive, but I thought that managing
> files (links) is quite easy task for any packaging system. Moreover the
> 'managing' will be most probably just adding the missing links, no
> deleting or renaming (exceptions proves the rule).

As an example, soon enough, we will have to update the locale shared
objects, once again, to incorporate the new Austin standard revision and
at that time, it is inevitable for us to up the shared objects' version from
the current <locale>.so.3 and methods_<locale>.so.3 to, say, *.so.4.
That means we will not only have to update the canonical locales but also
have to ensure that all of those so many thousand symbolic links are also
correctly changed if we were to support the aliases with symlinks.
(I believe the number of symlinks will be somewhere between a half million
to a bit more than 1 million.)

That's going to be and clearly a maintenance hazard and overhead and
I don't think we have enough bandwidth and resource to handle that kind of
changes, do you?


> The downsizes of such solution I see:
> 
> a) unintuitive
> If my locale does not work and similar are installed (consulting locale
> -a), first I would look into locale directory to see what's there.
> Creating just another symlink is next logical step.

It is precisely the reason why there is locale_alias(5) man page and
locale(1) man page update and so on so that we document and give info to
people instead of let people to look into system directories which isn't
really an official interface to regular users for that matter. (I'm a geek
for sure to know the details and will look into system directories and
what not since I was responsible for LC_COLLATE category in SunOS and
wrote the guts when we re-architectured it to be codeset independent several
years ago but I don't plan to ask people to look at /usr/lib/locale/
directory for them to know the available locales at the current system;
that's the last resort thing that I would recommend regular people to do.)

Re the "creating just another symlinks", if you wish, you (i.e., customers)
can always do that even after the project integration.


> b1) harder to manage
> If we have separate packages for each locale, the shared locale_alias
> would have to be dynamically modified with each package install and
> remove.

The locale_alias isn't a file and will not be. It is a section 5 man page
merely to give miscellaneous information and in this case locale alias
details including the lists of the supported locale aliases.

Hence there is no change from packaging perspective.


> b2) harder to manage
> if there is single locale_alias file, and user makes modifications, we 
> have to merge his change with our own patches (or how it is called
> these days). Creating symlink does not interfere with other packages.

Again, no, there isn't locale_alias file as I explicitly mentioned about
the tables and details at the spec and also locale_alias(5) man page.


> c) slow
> If user has LC_ALL=blah, then each subsequent setlocale(3C) has to go 
> through the locale_alias and compare thousands of lines to blah. 
> Setlocale is frequently called command ... (You could cache this 
> blah=cs_CZ somewhere. But then there is need for tools to purge this 
> cache and user has to find out how to do it when he changes the 
> locale_alias).

There is no change what so ever if you're using canonical locale names
as you're supposed to do so since this alias checking is the last resort
step within the setlocale(3C) and even with, say LC_ALL=blah like erroneous
settings, the search to find if it is a supported alias will be extremely
fast with only a few comparisons of characters in locale names with
a modified indexed trie search data structure.

Also, to be more precise on what's on the stake and matters, this is about
whether to fail for acceptable locale aliases and fall back to C locale *or*
to have a few comparisons of characters and find out if it is a supported
locale alias or not (and only when the locale name supplied isn't supported
as it is) and, if so, support the locale alias to have a better compatibility
with other competing platforms *not* on the performance degradation on
erroneous locale setting cases.


>> Hence, this project proposes to have a transparent locale name alias support
>> mechanism at libc with embedded locale name mapping tables as outlined at
>> below to remedy the interoperability/compatibility issue and aid users
>> who want to migrate from other platforms to Solaris.
> 
> Linux uses locale.alias (Debian has /etc/locale.alias and 
> /usr/X11R6/lib/X11/locale/locale.alias for example). To aid Linux users, 
> it might be good idea to create locale.alias(4) man page.

We already have X11 locale.alias file and we have been having it for almost
two decades now and so that's not a topic to discuss.

The Debian /etc/locale.alias is a way to have an alias support. I don't
think that is really necessary for us as other Linux distros are not
supporting it and considering that we will have locale_alias embedded into
libc. I'll also elaborate on this with replies to Nicolas's and Alan's
emails.


>> TECHNICAL DETAILS
>>
>> Currently, when a locale selection is made with setlocale(3C), as an example
>> for 32-bit environment, the function looks for the locale shared object at
>> /usr/lib/locale/<locale>/<locale>.so.3. In this process of locating the 
>> locale
>> shared object, the <locale> name given to the setlocale(3C) and the <locale>
>> component of the path to the locale shared object must be identical byte by
>> byte.
> 
> That is problem for symlinks. Still would not it be easier to relax this 
> scheme a bit (file locale.so.3 would work for all locales [pardon my
> naivity again ...]).

Please elaborate what did you mean by the above, "file locale.so.3 would
work for all locales" and also "That is problem of symlinks."

The reason why I'm asking this is that it appears that you're proposing
a single locale shared object for all locales??


>> The mapping tables shown at locale_alias(5) [2] are formulated from the
>> data extracted from [3], [4], and some operating systems such as AIX 6.1,
>> HP-UX 11.11, RHEL 5.4, Ubuntu 9.04, and the latest OpenSolaris/Solaris Nevada
>> via some simple reverse engineering. They will be embedded into libc under
>> read only data section. (We expect there will be no significant changes at
>> the tables, if any, in the future.)
> 
> Oh, this is not file, but rather ELF section? I don't see the aid for
> the user if he has to file escalation (pay support first) and wait some
> time to get updated packages from upstream, just to make his cs_CZ.UtF-8
> work?

I think you thought locale_alias(5) is a file that this project will
supply? And then as you are reading the spec, now you found that it isn't?
Please elaborate if that's case re locale_alias(5).

Re the cs_CZ.UtF-8 locale name, as I specified in the locale_alias(5) man
page, there is a codeset part normalization in this enhancement project and
thus cs_CZ.UtF-8, cs_CZ.UTF8, cs_CZ.utf8, cs_CZ.utf-8, and so on will all
be accepted as cs_CZ.UTF-8 aliases as a by product of the codeset part
normalization.

Please read locale_alias(5) man page.



>> Although this project does not change locale(1) utility, this project also
>> update the NOTES section of locale(1) man page as shown at [2] to clarify on
>> the "locale -a" output that locale aliases are supported only as aliases and
>> will not be shown at the output.
> 
> 
> Just out of interest, what is the difference between canonical locale
> name, and 'additional locale name'? Why do we have to keep them
> separated? We would not need lawyer-like documentation like
> http://sac.sfbay/PSARC/2009/594/materials/setlocale.3c.diff

You may not need explicit specification but such specification is
needed to make our features clearly documented and described. Under
documenting, ambiguous documentation, or lack of it should be considered
harmful not the other way. I also think that what I proposed are not
over-specifying things.

The canonical locale names are, literally, locale names of established
principles. The locale name aliases are, literally, aliases to canonical
locale names.

Re the "additional locale names", as described in the catopen(3C) and
gettext(1), they literally refer to locale names that the gettext(1) and
so on are going to additionally check on to find its message catalog:

      If lang specified is a canonical locale name to obsoleted Solaris        |
      locale names as described in locale_alias(5) and the above mentioned     |
      ordinary locations with lang do not yield a message object, for          |
      a better backward compatibility, gettext additionally looks for          |
      its message object using the obsoleted Solaris locale names as           |
      the additional locale names to check on with in place of lang.           |

What part of it do you find not clear?


> Maybe I just don't see the problem. Customer takes his script form
> HP-UX, and suddenly it's output is not in French, but rather in English.
> How exactly will help him this quite complex change?

Please see the answer at the beginning of this email.


> 
> 
> Thank you for your patience with me
> 

Reply via email to