I began working on belocs-locales-data in October 2004, to include most patches sent to Debian BTS and upstream bugzilla. From the beginning, it was pretty obvious to me that I needed also to fork localedef, because having iso-*.def hardcoded in localedef makes transitions much harder, e.g. users cannot change their currencies on an old system when a country makes such a change. For a similar reason, I also dislike the strong correlation between locales and libc6 packages. When working on improving locales, it became more and more obvious that localedef is really buggy. The first encountered problem was with Dzongkha locale, it needs many collation elements and GNU localedef loops forever when it encounters more than 256 collating elements. After sending patches to BZ368, I implemented these changes as a concept of proof, and so far I was told that Dzongkha locale works fine with belocs-locales-data. After more digging into GNU localedef internals about collation, I filed BZ645 with test cases. I did not receive any answer to bugreports sent against localedef, and from there did not sent all my patches to upstream.
In this mail, I will describe the patches applied against belocs-locales-bin (which ships locale, localedef and locale-gen programs), so that we can discuss which ones could be merged into libc6 or pushed upstream. I will prepare dpatches to apply to glibc-package on the issues which you believe are worth getting from belocs-locales-bin. In a later mail, I will explain the changes applied against belocs-locales-data, but some changes (like the Dzongkha locale) need patches to be applied against localedef, so I prefer to discuss these ones first. Instead of discussing several issues in the same thread, it would certainly be a good idea to have an issue per reply. Feel free to start a new thread if you prefer. Patches in belocs-locales-bin are maintained with quilt, which means that there is a debian/patches/series file listing patches in the desired order. The debian/patches directory is temporarily online at http://people.debian.org/~barbier/tmp/belocs-locales-bin/patches/ A. Changes in locale-gen =-=-=-=-=-=-=-=-=-=-=-=- As locale-gen is not an upstream program, there is no patch here. The current script is temporarily available at http://people.debian.org/~barbier/tmp/belocs-locales-bin/locale-gen The main differences with locale-gen from the locales package are: * It accepts few command-line options, and can also be driven by a configuration file /etc/belocs/locale-gen.conf: --purge remove existing locales before processing --archive store compiled locale data inside a single archive --no-archive do not store compiled locale data inside a single archive (default) --aliases=FILE read locale aliases from FILE. (Default: /etc/locale.alias) * It detects the magic number currently used by GNU libc for compiled locale data, and tells localedef to write compiled locale data suitable for this format, if it is supported. E.g. my localedef supports both 20000828 (glibc >= 2.1.96) and 20031115 (glibc >= 2.3.3) magic numbers for a long time; when I upgraded to glibc 2.3.5-3, I was not forced to upgrade belocs-locales-data, I only needed to re-run locale-gen so that locales are compiled into the right format. Of course it would be much better if this rebuild was triggered by libc6, but this is another story ;) * It keeps tracks of dependencies between generated locales and locale source files, so that locales are generated only if some source files needed for this locale have been modified, or if the magic number changed. The --purge command-line flag tells locale-gen to purge everything before processing locales. This is very convenient on slow machines (yes, mine is very slow). * By default, locales are written into the old format (not into an archive file). My motivation was that if someone needs to add a local locale, she can compile her locale into $HOME/share/locale and set LOCPATH to $HOME/share/locale:/usr/lib/locale if she wants to use either her preferred locale or a system one, e.g. with LANGUAGE=xx_XX:de But this will work only if system locales are compiled in old style, not with archive. I also made benchmarks to see if archive was faster, and IIRC noticed no significant difference. This behavior can be overridden by the --archive flag. B. General changes to compile locale and locale-gen =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= * autotoolize.diff standalone_build.diff These 2 patches are to compile belocs-locales-bin outside of glibc, and have no interest for you. C. Changes in locale =-=-=-=-=-=-=-=-=-=- * locale_print_LANGUAGE,diff This patch prints the LANGUAGE variable on output, if it is set, when locale is called without argument. D. Changes in localedef =-=-=-=-=-=-=-=-=-=-=-= * debian-localedef-fix-trampoline.diff Stolen from your localedef-fix-trampoline.dpatch * compatibility_magic_number.diff Add a --magic command-line flag, to specify output format. Add black magic to write data into the right format. * read_isocodes_at_run_time.diff If /etc/belocs/iso-{3166,4217,639}.def files are found at run-time, their content is parsed to override compile-time defaults. This way, users can change values if needed without having to recompile localedef. Another way would be to fully remove checks on these values, but I am not yet decided on the best approach. * allow_duplicate_country_num.diff Allow several countries to share the same country number in iso-3166.def; this may help transitions when country numbers do change. Again, these checks may alternatively be fully removed. * localedef_LC_COLLATE_do_not_copy_locales.diff In LC_COLLATE section, if the first keyword is "copy", the matching locale is not parsed, but instead directly loaded into memory if this locale does exist. This may cause some mismatch with my cache system, because the loaded locale may be outdated. Moreover this memory loading is much slower with very large archive files, so there is no performance loss (well, this is a moot argument since archive files are not used by default ;)). And third, the state machine used when parsing LC_COLLATE is more clean without this special casing of "copy", it was pretty difficult to understand why om_ET does not fail with 2 copy keywords whereas other locales have hard time using only one "copy". For all these reasons, locales are never copied from compiled data into memory by "copy" keywords. * localedef_complex_collate.diff [BZ368] Allow more than 256 collating-element definitions. This is needed for dz_BT. * localedef_fix_LC_COLLATE_rules.diff [BZ645] As shown in this bugreport, localedef does not respect order_start keywords, the same ruleset is assigned to all scripts. * localedef_preprocessor_collate.diff [BZ686] ISO 14652 defines preprocessor-like directives to help tailoring tables. E.g. in belocs-locales-data, locales which sort uppercase before lowercase do define UPPERCASE_FIRST copy "iso14651_t1" because my "iso14651_t1" replaced <RES-1> <MIN> <ANO> ... <AME> <CAP> by <RES-1> ifdef UPPERCASE_FIRST <CAP> else <MIN> endif <ANO> ... <AME> ifdef UPPERCASE_FIRST <MIN> else <CAP> endif In the locales package, "iso14651_t1" has to be copied into such locales and edited to swap 2 lines. These keywords were already recognized by GNU localedef, I only assigned actions to them. * localedef_LC_COLLATE_keywords_ordering.diff [BZ690] The current state machine is too strict, e.g. it should allow copy "iso14651_t1" script <FOOBAR> order_start <FOOBAR>;forward;forward;forward;forward,position ... order_end so that scripts not yet in "iso14651_t1" can be added to it. This patch is also needed to allow preprocessor directives before "copy" keywords, see above. * localedef_LC_IDENTIFICATION_optional_fields.diff In LC_IDENTIFICATION, audience, application and abbreviation keywords are optional, thus do not report an error (with -v flag) if they are not defined. * localedef_fix_exhausted_memory.diff Localedef aborts if a symbol name has exactly 55 characters in charmap file or in LC_COLLATE section: $ cat << EOT | localedef -i - -f UTF-8 -c /tmp/FOO LC_COLLATE collating-symbol <abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc> order_start forward <abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabc> order_end END LC_COLLATE EOT memory clobbered past end of allocated block (Actually the message was "memory exhausted" when I wrote this patch) * localedef_check_unknown_symbols.diff Detect and report undeclared symbols in collation rules. They always are the sign that something went wrong: a typo had been made, some declarations were erroneously removed, etc. These checks let me find several bugs in collation rules. * localedef_fix_lang_lib_test.diff I wrote this patch to fix compilation of nds_DE, a locale shipped by Mandriva, but I am no more convinced that this is the right fix. Do not consider, more investigations are needed on my side. E. Changes out of my scope =-=-=-=-=-=-=-=-=-=-=-=-=- Please fix BZ968/BTS#310635, strxfrm can segfault when above localedef bugs are fixed. Denis -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]