> > What would be much harder would be to make a letter sort as its
> > own independent letter between A and Z, with its own heading in the
> > index: for example, Ñ between N and O.  We could make Ñ sort between
> > N and O by outputting its sort string as NZZZ, but texindex would take
> > an entry with a sort key beginning with NZZZ as part of the "N"
> > section.  (I'm not sure what languages this would be an issue for.)
> > Also multi-level collation (as in the Unicode Collation Algorithm)
> > is right out.
> 
> Also this part.  Don't people expect the non-ASCII Latin characters to
> sort in the order of their Unicode codepoints, which would put them
> _after_ all the ASCII characters?

It depends on the language AFAIK.  Often accented characters are sorted
as variants of the unaccented character.  They are treated like upper-case
letters are.  So first the strings are compared ignoring accents and case,
and its only if they differ that accents and case are significant.  This is
achieved in a multi-level sort by constructing sort keys that are composed of
several parts, the first of which represents the characters with accents and
case distinctions removed.

As far as I've been able to gather, this is the approach used in French
and German.  In Swedish, ö is its own completely independent letter put at the
end of the alphabet.

>  Or are you trying to mimic what the
> French locale mandates as the collation order of the letters?

Yes, exactly - "mimic" is a good word as it won't be done perfectly.  It
wouldn't be correct if two index entries differed only by accents over
letters.

Reply via email to