>> > > > I consider it very bad that `texindex` is locale-dependent. >> > > > IMHO the proper solution is to make `texinfo.tex` emit a >> > > > document encoding statement to the (unsorted) index file >> > > > that in turn gets acknowledged by `texindex`. >> >> Sure? No. But I have some thoughts. >> >> > FWIW, I don't even understand how can this be accomplished, >> > unless the program reinvents all the library functions that deal >> > with characters from scratch, instead of using libc functions >> > (which are locale-dependent). And Gawk does use libc functions >> > for that. >> >> The current islower() is >> >> function islower(c) >> { >> return index("abcdefghijklmnopqrstuvwxyz", c) > 0 >> } >> >> It could instead be >> >> function islower(c) >> { >> return c ~ /[[:lower:]]/ >> } >> >> And similar for the others. That would work for any unicode >> character. > > Sure, but is the issue only with lower-case letters? What about > collation order or even determining what is and isn't a character > (as opposed to incomplete byte sequence)?
Two remarks. * I think it would be OK if the documentation says that i18n support for sorting only works with awk programs that understand `LANG`. * Let's assume that GNU awk behaves similar to, say, GNU sort. The collation order and input encoding gets controlled with `LANG` – looking into the awk info manual this seems like a reasonable assumption. As far as I can see, my two issues could be resolved by a shell wrapper around the awk program that analyzes the (yet to be added) `@documentencoding` and `@documentlanguage` settings in an unsorted index file. From those two settings it synthesizes a proper `LANG` argument that gets passed to GNU awk, et voilà. Am I missing something? Werner