> > What would be much harder would be to make a letter sort as its > > own independent letter between A and Z, with its own heading in the > > index: for example, Ñ between N and O. We could make Ñ sort between > > N and O by outputting its sort string as NZZZ, but texindex would take > > an entry with a sort key beginning with NZZZ as part of the "N" > > section. (I'm not sure what languages this would be an issue for.) > > Also multi-level collation (as in the Unicode Collation Algorithm) > > is right out. > > Also this part. Don't people expect the non-ASCII Latin characters to > sort in the order of their Unicode codepoints, which would put them > _after_ all the ASCII characters?
It depends on the language AFAIK. Often accented characters are sorted as variants of the unaccented character. They are treated like upper-case letters are. So first the strings are compared ignoring accents and case, and its only if they differ that accents and case are significant. This is achieved in a multi-level sort by constructing sort keys that are composed of several parts, the first of which represents the characters with accents and case distinctions removed. As far as I've been able to gather, this is the approach used in French and German. In Swedish, ö is its own completely independent letter put at the end of the alphabet. > Or are you trying to mimic what the > French locale mandates as the collation order of the letters? Yes, exactly - "mimic" is a good word as it won't be done perfectly. It wouldn't be correct if two index entries differed only by accents over letters.
