Re: low hanging fruit while cleaning up test failures

Martin Sebor Wed, 30 Jan 2008 22:05:01 -0800

Travis Vitek wrote:
[...]

FYI, the setlocale() names can be really long on some platforms (e.g.,
on HP-UX, they always take the form:
/<category>/<category>/<category>/<category>/<category>/<category>
so 64 characters may not be enough for all locales).


I thought this was only the case when using setlocale (LC_ALL, ...)


Definitely for LC_ALL.

because the OS needs to return a string that indicates which locales are
used for each category so that the return of setlocale (LC_ALL, 0) can
be used to restore the locale to a previous state. I believe that this
is the way that it works on HP and AIX.


Looks like you're right (at least on HP-UX where I confirmed it).


So here is the thing that I am concerned about. The previous code
allowed you to specify which locale facet you wanted to get locales for.


You mean category (the LC_XXX thing).

I didn't understand how or why this was useful. I believe that I do now.
Say a call to setlocale (LC_ALL, "X") returns the string "/A/B/C/D/E/F".
If I just capture the result of setlocale (LC_CTYPE, 0), I'm not going
to see that the other facets are set differently. I need to store the
result of locale -a, or the names of the locales used by each of the
components.


I'm not sure there are any tests that use the category argument but
I think the main point was to eliminate locales that seem to work
but some of whose categories don't. E.g., setlocale(LC_ALL, "zh_CN")
might return non-NULL but setlocale(LC_MESSAGES, "zh_CN") returns
NULL. I think we've had this happen.

[...]


This is similar to what I started out with. The only disadvantage is
that it doesn't allow you to prioritize one attribute over another.
Nobody ever said we cared about order, but I assumed that we would.


As we discussed (and so just for the record), the "prioritization"
within, say, the MB_CUR_MAX field would be useful, but within the
locale name probably less so.

[...]

Internally it would translate into multiple grep-like expressions
(i.e., arguments to the -e grep option) looking like this:

  *_JP.* 3\n
  *_JP.* 4\n
  *_CN.* 3\n
  *_CN.* 4\n


Yes, this would work fine provided that you didn't ever want to get all
4 byte encodings before the 3 byte encodings. I like the syntax much
better though.


It would be nice to be able to specify the ordering somehow. Again,
for the record, the approach we discussed was to specify the order
using a second argument, say something like this:

    rw_locale_query("*_{JP,CN}.* {3,4}", "2d");

where the "2d" means: order the second bracket field in a descending
order, i.e., 4 before 3. The first field isn't specified, so the
function would internally expand the query into one of the following
grep-like expressions:

    "*_JP.* 4\n*_CN.* 4\n*_JP.* 3\n*_CN.* 3\n"
    "*_CN.* 4\n*_JP.* 4\n*_JP.* 3\n*_CN.* 3\n"
    "*_JP.* 4\n*_CN.* 4\n*_CN.* 3\n*_JP.* 3\n"
    "*_CN.* 4\n*_JP.* 4\n*_JP.* 3\n*_CN.* 3\n"

with the whole thing basically being a simplified grep pattern that
could be used to search in a plain text file in this format:

  <locale> <mb-cur-max> <alias-list>


BTW, I never did find a way to get an alias for a locale. If I have the
name of a locale, how can I find the list of aliases?


I don't know of any programmatic way to get the list of known aliases
but searching the filesystem for symlinks to locale database should
work. The aliases for our own locales and codesets are also listed
in the source files.

Martin

Re: low hanging fruit while cleaning up test failures

Reply via email to