On Tue, Mar 21, 2006 at 06:46:45PM +0000, Markus Kuhn wrote: [...] > References: > > - Unicode Collation Algorithm (UCA), http://www.unicode.org/reports/tr10/ > > - ISO TR 14652 (draft: > http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14652.pdf)
ISO TR 14652 does not deal with collation, GNU libc locales are based on ISO 14651. A draft is available at http://dkuug.dk/jtc1/sc22/open/n2933.pdf Iso14651_t1 is intended to be the Common Template Table defined in appendix A. > - http://sources.redhat.com/bugzilla/show_bug.cgi?id=374 This bugreport does not contain any information. OTOH http://sources.redhat.com/bugzilla/show_bug.cgi?id=388 explains that current sorting order in wrong in Polish. > - https://bugzilla.novell.com/show_bug.cgi?id=152778 Access denied. > Example: > > $ cat >demo.txt > death > de luge > de-luge > deluge > de-luge > de Luge > de-Luge > deLuge > de-Luge > demark > ^D > > and then try > > $ LC_COLLATE=C sort demo.txt > $ LC_COLLATE=en_GTB.UTF-8 sort demo.txt > $ LC_COLLATE=en_GB sort demo.txt Out of curiosity, do you see differences between en_GB and en_GTB.UTF-8? There should be none. > and see the difference with how your dictionary or phone book sorts > these. My understanding is that authors of ISO 14651 tried to gather some general rules which are relevant for several locales, and other locales have to derive from these rules if needed. The problem is that very few people submitted changes, and as can be seen above, it is sometimes hard to push changes into GNU libc. But at least this is an open process, other distributions can make up their mind and include the requested changes if they want. Denis -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/