On Thu, Feb 01, 2024 at 10:16:07PM +0000, Gavin Smith wrote: > An alternative is not to have such a variable but just to have an option > to collate according to the user's locale. Then the user would run e.g. > "LC_COLLATE=ll_LL.UTF-8 texi2any ..." to use collation from the ll_LL.UTF-8 > locale. They would have to have the locale installed that was appropriate > for whichever manual they were processing (assuming the "variable weighting" > option is appropriate.)
I do not like that possibility, I think that we should avoid using the user locales when it comes to document output in general. If we use the user locale I think that it should be by using strxfrm in C and "use locale" in Perl, not by checking a specific LC_COLLATE value in the environment. Here is my updated thinking on the possibilities 1) lexicographic sorting on unicode strings (corresponds to USE_UNICODE_COLLATION=0 currently) 2) unicode default sorting obtained by Unicode::Collate in Perl and strxfrm_l in C with "en_US.utf-8", the current default ("en_US.utf-8" could be different on different platforms, a list instead of only one possibility if "en_US.utf-8" is not always available...) 3) sorting based on @documentlanguage using, in perl Unicode::Collate::Locale with locale @documentlanguage and in C strxfrm_l with "@documentlanguage.utf-8" (at least on GNU/Linux, the locale name setup for strxfrm_l could be different on other platforms). 4) sorting based on a customization variable, such as COLLATION_LANGUAGE. it would be the same as the previous one, with @documentlanguage replaced by COLLATION_LANGUAGE. 5) sorting based on the user locale, using strxfrm in C and "use locale" and regular sorting on unicode (internal perl encoded) strings in Perl. 1) and 2) are already implemented and currently customized with USE_UNICODE_COLLATION. I do not think that we need 5), but we could implement it if users ask for it. We do not need to implement the other options right away, but we may want to think about the way to select those options such as not to change the customization options when they are implemented. I think that the options are * use only one variable with a textual value, for example with, for 1-5 above USE_COLLATION=basic/default/documentlanguage/custom/locale * use different variables as switches between the different options, for instance USE_UNICODE_COLLATION to switch to 1), and more or less one variable for each of the other possibilities. I personally would favour using only one customization variable, but I will implement whatever is preferred. -- Pat