> Date: Thu, 12 Oct 2023 15:00:57 +0200 > From: Patrice Dumas <pertu...@free.fr> > Cc: bug-texinfo@gnu.org > > On Thu, Oct 12, 2023 at 01:29:27PM +0300, Eli Zaretskii wrote: > > What is "smart sorting"? where is it described/documented? > > It is, in general, any way to sort Unicode that takes into account > natural languages words orders. In practice, what is used in > Unicode::Collate is the 'Unicode Technical Standard #10' Unicode > Collation Algorithm (a.k.a. UCA) described in > http://www.unicode.org/reports/tr10. In texi2any, we set an option of > collation, > ( 'variable' => 'Non-Ignorable' ) > such that spaces and punctuation marks sort before letters. This > specific option is described in > http://www.unicode.org/reports/tr10/#Variable_Weighting > > It would be perfect if the same sorting could be obtained, but if > C code does not follow exactly the same standard, I do not think > that it is so problematic, as long as the sorting is sensible. It could > actually be problematic for tests, but if the output of texi2any is ok > even if not fully reproducible, it would still be better than sorting > according to the Unicode codepoint in a full C implementation.
What you say is not detailed enough, but using my crystal ball I think you can have this with glibc-based systems, and also on Windows (but that requires using a special API for comparing strings). Not sure about the equivalent features on other systems, like *BSD and macOS. You can see that in action in how GNU 'ls' sorts file names. > > In general, Unicode collation rules are locale- and > > language-dependent. My recommendation for Texinfo is not to use > > locale-specific collation rules, so that the indices would come out > > sorted identically no matter in which locale the user runs texi2any. > > That's the plan. The plan is to use the @documentlanguage information > with Unicode::Collate::Locale in the future, but never use the locale. I don't recommend to tailor index sorting for the language indicated by @documentlanguage, either. > This is still a TODO item, though, as Unicode::Collate::Locale is a perl > core module since perl 5.14 only, released in 2011, so my plan was to > wait for 2031 to use it and be able to assume that it is indeed present > the same way we assume that Unicode::Collate is present. We can have this in C today.