>> > > > I consider it very bad that `texindex` is locale-dependent.
>> > > > IMHO the proper solution is to make `texinfo.tex` emit a
>> > > > document encoding statement to the (unsorted) index file
>> > > > that in turn gets acknowledged by `texindex`.
>>
>> Sure? No. But I have some thoughts.
>>
>> > FWIW, I don't even understand how can this be accomplished,
>> > unless the program reinvents all the library functions that deal
>> > with characters from scratch, instead of using libc functions
>> > (which are locale-dependent). And Gawk does use libc functions
>> > for that.
>>
>> The current islower() is
>>
>> function islower(c)
>> {
>> return index("abcdefghijklmnopqrstuvwxyz", c) > 0
>> }
>>
>> It could instead be
>>
>> function islower(c)
>> {
>> return c ~ /[[:lower:]]/
>> }
>>
>> And similar for the others. That would work for any unicode
>> character.
>
> Sure, but is the issue only with lower-case letters? What about
> collation order or even determining what is and isn't a character
> (as opposed to incomplete byte sequence)?
Two remarks.
* I think it would be OK if the documentation says that i18n support
for sorting only works with awk programs that understand `LANG`.
* Let's assume that GNU awk behaves similar to, say, GNU sort. The
collation order and input encoding gets controlled with `LANG` –
looking into the awk info manual this seems like a reasonable
assumption.
As far as I can see, my two issues could be resolved by a shell
wrapper around the awk program that analyzes the (yet to be added)
`@documentencoding` and `@documentlanguage` settings in an unsorted
index file. From those two settings it synthesizes a proper `LANG`
argument that gets passed to GNU awk, et voilà.
Am I missing something?
Werner