> From: Gavin Smith <[email protected]> > Date: Sun, 19 Apr 2026 21:27:12 +0100 > Cc: [email protected], Werner LEMBERG <[email protected]> > > * Here's my current preferred solution, which should work with any awk (gawk > or mawk) regardless of the locale setting, as well as with XeTeX and > LuaTeX (which Werner Lemberg reported problems with in 2022): > > In texinfo.tex, output multibyte UTF-8 sequences with braces around > them in the sort key. > > This works because texindex preserves braced units. > > $ cat test.texi > \input texinfo > > @cindex à gré, césure > @cindex écrire des lettres > @cindex bbbb > > > Index: > @printindex cp > > @bye > $ cat test.cp > @entry{{à} gr{é}, c{é}sure}{1}{à gré, césure} > @entry{{é}crire des lettres}{1}{écrire des lettres}
Sorry, I don't understand how this could work in texindex. The sorting in Awk still uses libc functions like strcoll to compare strings, and those will only DTRT if libc works in the locale that supports the non-ASCII characters in question. (In the example, they are Latin-1 characters, but in general they could be any non-ASCII, like Chinese text.) What am I missing? The only way I know of to perform these tasks in a way that don't require the corresponding locale to be available is to use a library that can process non-ASCII text without using libc locale-dependent functions. Emacs, for example, has such a "library" in its own code. One alternative is to use something like ICU. For UTF-8-only encoding, we could use Gnulib's libunistring.
