[texindex (GNU texinfo) 6.8dev] [GNU Awk 4.2.1, API: 2.0] [openSUSE Leap 15.4]
There are two bugs with texindex, making it basically unusable for everything except English as the main document language. For the report below, here is an input file. ``` \input texinfo.tex @documentencoding UTF-8 @documentlanguage ca @findex a @findex à @findex u @findex ù @printindex fn @bye ``` * The first, really severe bug is that the resulting output is completely broken if `texindex` is called with `LANG=C`. Saying ``` LANG=C texi2pdf sort-ca.texi ``` creates the following `.fns` output ``` \initial {0xc3} \entry{\code {à}}{1} \entry{\code {ù}}{1} \initial {A} \entry{\code {a}}{1} \initial {U} \entry{\code {u}}{1} ``` As can be seen, the `\initial` line contains a single byte (where '0xc3' is a real byte), which suprisingly doesn't make pdftex abort, but both xetex and luatex stop with errors. I have to use a UTF-8 locale like `en_US.utf8` to get decent output. I consider it very bad that `texindex` is locale-dependent. IMHO the proper solution is to make `texinfo.tex` emit a document encoding statement to the (unsorted) index file that in turn gets acknowledged by `texindex`. * While `texindex` is sensitive to the locale regarding the input encoding, it isn't for collation: any `LANG` or `LC_COLLATE` setting gets ignored. Similarly, it ignores the `@documentlanguage` instruction to derive a sorting order. For example, the Catalan order for the above example should be 'aàuù', however, in the output it is sorted as `àùau'. The proper fix would be to make `texinfo.tex` emit a document language statement to the (unsorted) index file that in turn gets acknowledged by `texindex`. Werner