There has occasionally been talk of a "locales-all" package, which would contain compiled binary forms of all locales. This would use up a bit more bandwidth and disk space in order to save CPU time: generating the locales can take a long time, if you need many and don't have a fast processor.
I've made a preliminary patch to implement locales-all, if only to get some numbers. The existing locales .deb is about 3.9 MB, resulting in about 10.5 MB installed size, plus the generated files (probably at most a few hundred kilobytes). My locales-all .deb is about 14 MB, with an installed size of 59 MB. Thus, locales-all is about 10 MB bigger to download, and 48 MB bigger installed. Generating the three locales needed on my mail/web/etc server, which is used by several people, takes about 15 seconds. Locales with more complicated character sets, especially many Asian languages, will make this rather slower still. Is the time savings on installation signficant enough that it is worthwhile to make builds slower, and spend make mirrors that much bigger? I don't know. I guess it depends on how much time you spend on waiting for locale-gen to run. I don't mind waiting the 15 seconds it takes on my server every time I upgrade libc6, but I run stable, so it happens very rarely. When implementing this, my first step was to clean up the locale-gen script a bit. The current version is a bit messy, having unnecessary line continuation escapes and compressing things into single lines in an obfuscatory way. Attaches is locale-gen.diff and locale-gen.clean, the former being the diff and the latter the entire version (in case the diff doesn't apply cleanly anymore). The functionality is exactly the same, but I claim the cleaned up version is easier to read, and thus to maintain in the future. If nothing else, I would like to see this included in the package. Also attached is the preliminary patch for locales-all. I started by enhancing locale-gen further, adding features to it to make it possible to override the location for /etc/locale.gen, and directing output to somewhere else than /usr/lib/locale, and also for reading data files from a custom location. This was necessary so that I could easily build the locale-archive file for the locales-all package. Which I then did. This locales-all package replaces the locales package. It might be better to have locales-all just depend on locales (identical version, of course), so as not to duplicate the data files into two .debs. I have tested this lightly. Any comments are welcome. I didn't yet want to file a wishlist bug against libc6 before it's decided that locales-all is a good idea in the first place.
--- locale-gen.from-svn 2005-09-25 15:53:33.975828184 +0000 +++ locale-gen 2005-09-25 15:53:33.975828184 +0000 @@ -1,22 +1,31 @@ #!/bin/sh +# +# Create or update /usr/lib/locale/locale-archive with locale data for +# the locales specified in /etc/locale.gen. +# +# This script has been written for the Debian project and is under +# the GNU Lesser General Public License, version 2. set -e LOCALEGEN=/etc/locale.gen LOCALES=/usr/share/i18n/locales -if [ -n "$POSIXLY_CORRECT" ]; then - unset POSIXLY_CORRECT -fi +# localedef works differently depending on whether POSIXLY_CORRECT is +# set or not. We want the behavior when it is not set so we unset it. +unset POSIXLY_CORRECT +# Let's not do anything if $LOCALEGEN doesn't exist or is empty. [ -f $LOCALEGEN -a -s $LOCALEGEN ] || exit 0; # Remove all old locale dir and locale-archive before generating new # locale data. rm -rf /usr/lib/locale/* || true +# Make sure new files are created with the right permissions. umask 022 +# Is an entry in $LOCALEGEN correct: does it contain two fields? is_entry_ok() { if [ -n "$locale" -a -n "$charset" ] ; then true @@ -27,16 +36,25 @@ } echo "Generating locales (this might take a while)..." -while read locale charset; do \ - case $locale in \#*) continue;; "") continue;; esac; \ - is_entry_ok || continue - echo -n " `echo $locale | sed 's/\([EMAIL PROTECTED]).*/\1/'`"; \ - echo -n ".$charset"; \ - echo -n `echo $locale | sed 's/\([EMAIL PROTECTED])\([EMAIL PROTECTED])*/\2/'`; \ - echo -n '...'; \ - if [ -f $LOCALES/$locale ]; then input=$locale; else \ - input=`echo $locale | sed 's/\([^.]*\)[EMAIL PROTECTED](.*\)/\1\2/'`; fi; \ - localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale; \ - echo ' done'; \ +while read locale charset +do + case $locale in + \#*) continue ;; + "") continue ;; + esac + is_entry_ok || continue + echo -n " `echo $locale | sed 's/\([EMAIL PROTECTED]).*/\1/'`" + echo -n ".$charset" + echo -n `echo $locale | sed 's/\([EMAIL PROTECTED])\([EMAIL PROTECTED])*/\2/'` + echo -n '...' + if [ -f $LOCALES/$locale ] + then + input=$locale + else + input=`echo $locale | sed 's/\([^.]*\)[EMAIL PROTECTED](.*\)/\1\2/'` + fi + localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale\ + || true + echo ' done' done < $LOCALEGEN echo "Generation complete."
locale-gen.clean
Description: application/shellscript
diff -ru glibc-2.3.5-faster/debian/control glibc-2.3.5-locales-all/debian/control --- glibc-2.3.5-faster/debian/control 2005-09-24 14:12:43.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/control 2005-09-25 11:21:12.000000000 +0000 @@ -37,6 +37,24 @@ savings over how this package used to be, where all locales were generated by default. This created a package that unpacked to an excess of 30 megs. +Package: locales-all +Architecture: all +Section: base +Priority: extra +Provides: i18ndata, locales +Depends: ${locale:Depends}, debconf (>= 0.2.26) +Conflicts: locales, localebin, wg15-locale, i18ndata, locale-ja, locale-ko, locale-vi, locale-zh +Replaces: locales, localebin, wg15-locale, libc6-bin, i18ndata, glibc2, locale-ja, locale-ko, locale-vi, locale-zh +Description: GNU C Library: National Language (locale) data [support] (full) + Machine-readable data files, shared objects and programs used by the + C library for localization (l10n) and internationalization (i18n) support. + . + This package contains the libc.mo i18n files, plus all compiled (ready-to-use) + versions of all locale definitions. It is big (tens of megabytes), but + this can be faster than generating many locales on machines that need + support users from many backgrounds. See the package called "locales" + for a smaller version that creates only the versions that are required. + Package: nscd Architecture: alpha amd64 arm i386 m68k mips mipsel powerpc sparc ia64 hppa s390 sh3 sh4 sh3eb sh4eb freebsd-i386 Section: admin diff -ru glibc-2.3.5-faster/debian/control.in/main glibc-2.3.5-locales-all/debian/control.in/main --- glibc-2.3.5-faster/debian/control.in/main 2005-09-24 11:32:10.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/control.in/main 2005-09-25 11:20:46.000000000 +0000 @@ -37,6 +37,24 @@ savings over how this package used to be, where all locales were generated by default. This created a package that unpacked to an excess of 30 megs. +Package: locales-all +Architecture: all +Section: base +Priority: extra +Provides: i18ndata, locales +Depends: ${locale:Depends}, debconf (>= 0.2.26) +Conflicts: locales, localebin, wg15-locale, i18ndata, locale-ja, locale-ko, locale-vi, locale-zh +Replaces: locales, localebin, wg15-locale, libc6-bin, i18ndata, glibc2, locale-ja, locale-ko, locale-vi, locale-zh +Description: GNU C Library: National Language (locale) data [support] (full) + Machine-readable data files, shared objects and programs used by the + C library for localization (l10n) and internationalization (i18n) support. + . + This package contains the libc.mo i18n files, plus all compiled (ready-to-use) + versions of all locale definitions. It is big (tens of megabytes), but + this can be faster than generating many locales on machines that need + support users from many backgrounds. See the package called "locales" + for a smaller version that creates only the versions that are required. + Package: nscd Architecture: @threads_archs@ Section: admin diff -ru glibc-2.3.5-faster/debian/local/manpages/locale-gen.8 glibc-2.3.5-locales-all/debian/local/manpages/locale-gen.8 --- glibc-2.3.5-faster/debian/local/manpages/locale-gen.8 2005-09-24 11:32:10.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/local/manpages/locale-gen.8 2005-09-25 11:35:11.000000000 +0000 @@ -1,101 +1,48 @@ -.\" This -*- nroff -*- file has been generated from -.\" DocBook SGML with docbook-to-man on Debian GNU/Linux. -...\" -...\" transcript compatibility for postscript use. -...\" -...\" synopsis: .P! <file.ps> -...\" -.de P! -\\&. -.fl \" force out current output buffer -\\!%PB -\\!/showpage{}def -...\" the following is from Ken Flowers -- it prevents dictionary overflows -\\!/tempdict 200 dict def tempdict begin -.fl \" prolog -.sy cat \\$1\" bring in postscript file -...\" the following line matches the tempdict above -\\!end % tempdict % -\\!PE -\\!. -.sp \\$2u \" move below the image -.. -.de pF -.ie \\*(f1 .ds f1 \\n(.f -.el .ie \\*(f2 .ds f2 \\n(.f -.el .ie \\*(f3 .ds f3 \\n(.f -.el .ie \\*(f4 .ds f4 \\n(.f -.el .tm ? font overflow -.ft \\$1 -.. -.de fP -.ie !\\*(f4 \{\ -. ft \\*(f4 -. ds f4\" -' br \} -.el .ie !\\*(f3 \{\ -. ft \\*(f3 -. ds f3\" -' br \} -.el .ie !\\*(f2 \{\ -. ft \\*(f2 -. ds f2\" -' br \} -.el .ie !\\*(f1 \{\ -. ft \\*(f1 -. ds f1\" -' br \} -.el .tm ? font underflow -.. -.ds f1\" -.ds f2\" -.ds f3\" -.ds f4\" -'\" t -.ta 8n 16n 24n 32n 40n 48n 56n 64n 72n -.TH "LOCALE-GEN" "8" -.SH "NAME" -locale-gen \(em generates localisation files from templates -.SH "SYNOPSIS" -.PP -\fBlocale-gen\fP -.SH "DESCRIPTION" -.PP -This manual page documents briefly the -\fBlocale-gen\fP command. -.PP -By default, the locale package which provides the base support for -localisation of libc-based programs does not contain usable localisation -files for every supported language. This limitation has became necessary -because of the substantial size of such files and the large number of -languages supported by libc. As a result, Debian uses a special -mechanism where we prepare the actual localisation files on the target -host and distribute only the templates for them. -.PP -\fBlocale-gen\fP is a program that reads the file -\fB/etc/locale.gen\fP and invokes -\fBlocaledef\fP for the chosen localisation profiles. -Run \fBlocale-gen\fP after you have modified the \fB/etc/locale.gen\fP file. - - -.SH "FILES" -.PP -\fB/etc/locale.gen\fP -.PP -The main configuration file, which has a simple format: every -line that is not empty and does not begin with a # is treated as a -locale definition that is to be built. - -.SH "SEE ALSO" -.PP -localedef (1), locale (1), locale.gen (5). -.SH "AUTHOR" -.PP -This manual page was written by Eduard Bloch <[EMAIL PROTECTED]> for -the \fBDebian GNU/Linux\fP system (but may be used by others). Permission is -granted to copy, distribute and/or modify this document under -the terms of the GNU Free Documentation -License, Version 1.1 or any later version published by the Free -Software Foundation; with no Invariant Sections, no Front-Cover -Texts and no Back-Cover Texts. -...\" created by instant / docbook-to-man, Sat 02 Mar 2002, 16:43 +.TH LOCALE-GEN 1 "Sep 24, 2005" +.SH NAME +locale-gen \- generate locale data files +.SH SYNOPSIS +.B locale-gen +.RB [ \-hq ] +.RB [ \-d +.IR dirname ] +.RB [ \-g +.IR filename ] +.RB [ \-l +.IR dirname ] +.SH DESCRIPTION +The +.B locale-gen +program generates binary locale data files from textual description files. +The binary files are fast to load, but can take a lot of space. +For this reason, the Debian +.B locales +package does not any binary locale data files, only the textual ones. +.SH OPTIONS +.TP +.B \-h +Print a short help text. +.TP +.B \-q +Quiet operation: do not report progress. +.TP +.BI \-d " dirname" +Place output files into +.I dirname +instead of +.IR /usr/lib/locale . +.TP +.BI \-g " filename" +Read list of locales that are to be generated from +.I filename +instead of +.IR /etc/locale.gen . +.TP +.BI \-l " dirname" +Look for textual locale specifications in 'dir' instead of +.IR /usr/share/i18n/locales . +.SH "SEE ALSO" +.BR localedef "(1), " locale "(5), " locale "(7), " locale (1) +.SH AUTHOR +The program and this manual page have been written and modified by +a number of people for the Debian project. diff -ru glibc-2.3.5-faster/debian/local/usr_sbin/locale-gen glibc-2.3.5-locales-all/debian/local/usr_sbin/locale-gen --- glibc-2.3.5-faster/debian/local/usr_sbin/locale-gen 2005-09-24 11:32:10.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/local/usr_sbin/locale-gen 2005-09-25 12:40:33.621309664 +0000 @@ -1,22 +1,81 @@ #!/bin/sh +# +# Create or update /usr/lib/locale/locale-archive with locale data for +# the locales specified in /etc/locale.gen. +# +# This script has been written for the Debian project and is under +# the GNU Lesser General Public License, version 2. set -e +# Variables that may modified by options. LOCALEGEN=/etc/locale.gen LOCALES=/usr/share/i18n/locales -if [ -n "$POSIXLY_CORRECT" ]; then - unset POSIXLY_CORRECT -fi +OUTPUT=/usr/lib/locale +VERBOSE=1 +# localedef works differently depending on whether POSIXLY_CORRECT is +# set or not. We want the behavior when it is not set so we unset it. +unset POSIXLY_CORRECT + +verbose() { + if [ "$VERBOSE" = 1 ] + then + echo "$@" + fi +} -[ -f $LOCALEGEN -a -s $LOCALEGEN ] || exit 0; +print_help() { + cat <<EOF 1>&2 +Usage: $0 [options] +Generate locale binary files (by default, /usr/lib/locale/locale-archive +or other files in that directory) for the locales listed in /etc/locale.gen. + +Options are: + + -h This help. + -q Quiet execution: don't report progress. + + -d dir Use 'dir' instead of $OUTPUT for output. + -g filename Use 'filename' for input instead of $LOCALEGEN. + -l dir Look for locale specifications in 'dir' instead of + $LOCALES. -# Remove all old locale dir and locale-archive before generating new -# locale data. -rm -rf /usr/lib/locale/* || true +EOF +} +# Parse the command line. +TEMP=$(getopt d:g:hl:q "$@") +if [ $? != 0 ] +then + exit 1 +fi +eval set -- "$TEMP" + +while true +do + case "$1" in + --) shift 1; break ;; + -h) shift 1; print_help; exit 0 ;; + -q) shift 1; VERBOSE=0 ;; + -d) OUTPUT="$2"; shift 2 ;; + -g) LOCALEGEN="$2"; shift 2 ;; + -l) LOCALES="$2"; shift 2 ;; + *) echo "Unknown parameter $1" 1>&2; exit 1 ;; + esac +done + +# Let's not do anything if $LOCALEGEN doesn't exist or is empty. +if [ ! -f "$LOCALEGEN" -o ! -s "$LOCALEGEN" ] +then + verbose "$LOCALEGEN does not exist or is empty, so there's nothing to do." + exit 0; +fi + +# Make sure new files are created with the right permissions. umask 022 +# Is an entry in $LOCALEGEN correct: does it contain two fields? is_entry_ok() { if [ -n "$locale" -a -n "$charset" ] ; then true @@ -26,17 +85,42 @@ fi } -echo "Generating locales (this might take a while)..." -while read locale charset; do \ - case $locale in \#*) continue;; "") continue;; esac; \ - is_entry_ok || continue - echo -n " `echo $locale | sed 's/\([EMAIL PROTECTED]).*/\1/'`"; \ - echo -n ".$charset"; \ - echo -n `echo $locale | sed 's/\([EMAIL PROTECTED])\([EMAIL PROTECTED])*/\2/'`; \ - echo -n '...'; \ - if [ -f $LOCALES/$locale ]; then input=$locale; else \ - input=`echo $locale | sed 's/\([^.]*\)[EMAIL PROTECTED](.*\)/\1\2/'`; fi; \ - localedef -i $input -c -f $charset -A /usr/share/locale/locale.alias $locale || :; \ - echo ' done'; \ +# Create a temporary directory for storing the output while it is being +# generated. This way, the system continues to be fully operation until +# the (very brief) moment when we move the output to its final location. +TEMP=$(mktemp -d) +if [ "$?" != 0 ] +then + exit 1 +fi +mkdir -p "$TEMP/usr/lib/locale" + +verbose "Generating locales (this might take a while)..." +while read locale charset +do + case $locale in + \#*) continue ;; + "") continue ;; + esac + is_entry_ok || continue + verbose -n " `echo $locale | sed 's/\([EMAIL PROTECTED]).*/\1/'`" + verbose -n ".$charset" + verbose -n `echo $locale | sed 's/\([EMAIL PROTECTED])\([EMAIL PROTECTED])*/\2/'` + verbose -n '...' + if [ -f $LOCALES/$locale ] + then + input=$locale + else + input=`echo $locale | sed 's/\([^.]*\)[EMAIL PROTECTED](.*\)/\1\2/'` + fi + localedef --prefix "$TEMP" -i $input -c -f $charset \ + -A /usr/share/locale/locale.alias $locale \ + || true + verbose ' done' done < $LOCALEGEN -echo "Generation complete." +verbose "Generation complete." + +# Move files from the temporary directory to the final location. +rm -rf "$OUTPUT"/* +mv "$TEMP"/usr/lib/locale/* "$OUTPUT" +rm -rf "$TEMP" diff -ru glibc-2.3.5-faster/debian/rules glibc-2.3.5-locales-all/debian/rules --- glibc-2.3.5-faster/debian/rules 2005-09-24 13:45:28.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/rules 2005-09-25 12:33:14.000000000 +0000 @@ -118,7 +118,7 @@ curpass = $(filter-out %_,$(subst _,_ ,$@)) DEB_ARCH_REGULAR_PACKAGES = $(libc) # $(libc)-dev $(libc)-dbg $(libc)-prof $(libc)-pic -DEB_INDEP_REGULAR_PACKAGES = glibc-doc locales +DEB_INDEP_REGULAR_PACKAGES = glibc-doc locales locales-all DEB_UDEB_PACKAGES = # $(libc)-udeb libnss-dns-udeb libnss-files-udeb # Generic kernel version check diff -ru glibc-2.3.5-faster/debian/rules.d/debhelper.mk glibc-2.3.5-locales-all/debian/rules.d/debhelper.mk --- glibc-2.3.5-faster/debian/rules.d/debhelper.mk 2005-09-24 11:32:10.000000000 +0000 +++ glibc-2.3.5-locales-all/debian/rules.d/debhelper.mk 2005-09-25 13:06:28.000000000 +0000 @@ -40,6 +40,13 @@ install --mode=0644 $(DEB_SRCDIR)/localedata/ChangeLog debian/$(curpass)/usr/share/doc/$(curpass)/changelog endef +define locales-all_extra_debhelper_pkg_install + sh debian/local/usr_sbin/locale-gen \ + -d debian/$(curpass)/usr/lib/locale \ + -g debian/tmp-libc/usr/share/i18n/SUPPORTED \ + -l build-tree/glibc-*/localedata +endef + define glibc-doc_extra_debhelper_pkg_install install --mode=0644 $(DEB_SRCDIR)/ChangeLog debian/$(curpass)/usr/share/doc/$(curpass)/changelog install --mode=0644 $(DEB_SRCDIR)/linuxthreads/FAQ.html debian/$(curpass)/usr/share/doc/$(curpass)/FAQ.linuxthreads.html