Thanks Jonathan,
I had looked at package contents to figure out the sequence of fr_CA, and had found /usr/share/i18n/locales/. I then looked at /usr/share/i18n/locales/fr_CA, then at en_CA, then at iso14651_t1, and finally at iso14651_t1_common. This is where I decided to stop guesswork and looked for actual documentation.

I agree that not knowing the file's syntax was the final thing that discouraged me, but even seeing what locale(5) contains now is of little help (for me, it doesn't change anything).

I did mean this bug as being about the lack of *specification* of collation. Linking to a manual giving hints on how to interpret the code is better than nothing, but only a fraction of users will dare going that way. This is not about strcoll's manpage. I probably shouldn't have mentioned strcoll() specifically, this is about collation in general. I believe this should be documented in glibc-doc-reference, in section 7 "Locales and Internationalization" and easily reachable from 5.6 Collation Functions.

I think an even more general issue is that the influence of choosing a specific locale doesn't seem to be explained. The documentation explains what different locales can change, but not what each locale does. Debian's best-known interface to locale choice is dpkg-reconfigure locales. I'm not sure my dad would find it obvious that he wants to pick "fr_CA.UTF-8 UTF-8" there.

I don't think specifying the collation order of each locale would give that little gain. What made me hit this issue is I was trying to determine what locale a multilingual program should use (the best compromise assuming that a single locale will be used). Collation is important, and I think many people wonder how it works.
I however do agree that this will require important work.

Anyway, if we stick to the issue of collation, the Unicode collation algorithm is documented on http://www.unicode.org/reports/tr10/ The specification is non-free, but specifying the parameters of each locale and linking to it would be enough for me.
As for non-Unicode locales, I don't know.

POSIX 7.3.2 does contain a nice amount of useful information. It clearly describes collating sequence definitions. It also gives the collating sequence definition of C. That one is quite accessible. Thanks for that too Jonathan.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to