Re: `texindex` output depends on locale settings

Werner LEMBERG Sun, 06 Nov 2022 08:10:50 -0800

>> > > >   I consider it very bad that `texindex` is locale-dependent.
>> > > >   IMHO the proper solution is to make `texinfo.tex` emit a
>> > > >   document encoding statement to the (unsorted) index file
>> > > >   that in turn gets acknowledged by `texindex`.
>>
>> Sure? No. But I have some thoughts.
>>
>> > FWIW, I don't even understand how can this be accomplished,
>> > unless the program reinvents all the library functions that deal
>> > with characters from scratch, instead of using libc functions
>> > (which are locale-dependent).  And Gawk does use libc functions
>> > for that.
>>
>> The current islower() is
>>
>> function islower(c)
>> {
>>      return index("abcdefghijklmnopqrstuvwxyz", c) > 0
>> }
>>
>> It could instead be
>>
>> function islower(c)
>> {
>>      return c ~ /[[:lower:]]/
>> }
>>
>> And similar for the others.  That would work for any unicode
>> character.
>
> Sure, but is the issue only with lower-case letters?  What about
> collation order or even determining what is and isn't a character
> (as opposed to incomplete byte sequence)?


Two remarks.

* I think it would be OK if the documentation says that i18n support
  for sorting only works with awk programs that understand `LANG`.

* Let's assume that GNU awk behaves similar to, say, GNU sort.  The
  collation order and input encoding gets controlled with `LANG` –
  looking into the awk info manual this seems like a reasonable
  assumption.

  As far as I can see, my two issues could be resolved by a shell
  wrapper around the awk program that analyzes the (yet to be added)
  `@documentencoding` and `@documentlanguage` settings in an unsorted
  index file.  From those two settings it synthesizes a proper `LANG`
  argument that gets passed to GNU awk, et voilà.

  Am I missing something?


    Werner

Re: `texindex` output depends on locale settings

Reply via email to