Dear Wiki user, You have subscribed to a wiki page or wiki category on "Stdcxx Wiki" for change notification.
The following page has been changed by TravisVitek: http://wiki.apache.org/stdcxx/LocaleLookup ------------------------------------------------------------------------------ Once we have this list of expressions, we would enumerate all of the installed locales, and then search through them looking for locale names that match one of those regular expressions. The actual matching would be done using rw_fnmatch(). + [[Anchor(Part1)]] + = Part 1 (STDCXX-714) = + + The first thing that we needed was to write the function for doing name matching and add it to the test suite.. Martin has already added an implementation of [http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/fnmatch.cpp rw_fnmatch](), so that is done. + + The second thing that we needed was a function to do brace expansion. After much discussion, it was decided that the csh brace expansion rules made the most sense. Travis provided an implementation of two functions for doing brace expansion. The function [http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/braceexp.cpp rw_brace_expand]() does a simple brace expansion on the input string. There is no special treatment for whitespace, but escapes are properly handled. The function [http://svn.apache.org/viewvc/stdcxx/trunk/tests/src/braceexp.cpp rw_shell_expand]() does whitespace tokenization and collapse, and then does brace expansion on each token, much like the behavior you would see from the csh shell. + + Just for illustration, consider the following string. + + {{{ + a {1,2} b + }}} + + If you passed this to rw_brace_expand, the result would be + + {{{ + a 1 b a 2 b + }}} + + If you passed this to rw_shell_expand, the result would be + + {{{ + a 1 2 b + }}} + + In most cases you would want to use rw_shell_expand(). '''Perhaps ''rw_brace_expand'' should become an implementation function and the header/source/test should be renamed to shellexp.h/shellexp.cpp/0.shellexp.cpp''' + + [[Anchor(Part2)]] + = Part 2 (STDCXX-715) = + + Every platform has a unique list of locales available. For example, Windows sytems use {{{English}}} as a language name, but most *nix systems the canonical {{{en}}} or in some cases {{{EN}}}. This problem exists for all fields of the locale name. + + To deal with this, we need to provide a mapping between the native names and the canonical names that we plan to use in the query string. The plan is to provide one file with a list of all native locale names and the canonical names that they map to for all platforms. For efficiency, it would be nice that this table include other information that may be useful such as {{{MB_CUR_LEN}}} for each of those locales. + + I've collected all of the locale data on each of the platforms that are available to me. During this process, I've noticed a few issues with the name mapping. + + One issue is that a single native locale name may map to a different canonical locale name on different platforms. For example, {{{es_BO}}} maps to {{{es_BO.ISO-8859-15}}} on AIX, but it maps to {{{es_BO.ISO-8859-1}}} on Linux and SunOS. Consider that our mapping file would look something like this... + + {{{ + es_BO.ISO-8859-1 es_BO es_BO.ISO8859-1 es_BO.iso88591 + es_BO.ISO-8859-15 es_BO es_BO.8859-15 ES_BO + }}} + + If we look up the canonical name {{{es_BO.ISO-8859-1}}} we will see three possible locale names. If we look through our list of installed locales, we will find {{{es_BO}}}, but it would be wrong to return that locale because it doesn't actually match on this particular platform. + + So one solution for this might be to get the codeset name and store it in the mapping. This assumes that it is safe to request a locale using with the a codeset even though the list of installed locales didn't specify the codset. + + Another issue is that the data associated with each of the canonical locales, like {{{MB_CUR_LEN}}}, is different on each platform. The {{{ar_DZ.UTF-8}}} locale uses a 6 byte codeset on Linux, but a 4 byte codeset on other platforms. + + I think the solution for this would be to not store the MB_CUR_LEN value in the file, but capture it and append it to the canonical locale name when we enumerate the installed locales. + + [[Anchor(Part3)]] + = Part 3 (STDCXX-716) = + + The proposed interface to all of this is a single public function named rw_query_locales(). The signature would be... + + {{{ + char* rw_query_locales(const char* query, size_t count); + }}} + + The {{{query}}} parameter will be the query string. The {{{count}}} parameter is the maximum number of locales to return. This allows you to easily limit the number of locales tested. + + The expected format of the query string is similar to what is described above, except that the requested MB_CUR_LEN value will be expected to be part of the query string. The accepted MB_CUR_LEN value would be seperated from the canonical locale name expression with a period. An example query string... + + {{{ + "zh_*.*.{5..3} *_FR.*.1" + }}} + + This would match all 5, 4 and 3 byte encodings of the Chinese language in any country, then all 1 byte encodings for any language spoken in France. + + '''Perhaps we should consider adding an additional parameter to prepend the C/POSIX locales as there is no way to match them using the canonical locale name matching rules we've laid out above.''' [[Anchor(References)]] = References =