Dear Eli & Gavin,

The fix proposed by Gavin is IMHO sufficient for French. So if there is
a release of Texinfo with this fix in txi-fr.tex, I would be quite happy
as this would allow us to include ses-fr.texi into the set of compiled
manuals, which was the object of the original emacs-devel discussion
that triggered this thread.

Having said that, what I was trying to highlight in my previous email is
that:

- this fix needs that Texinfo maintainers do some customization for each
  language in order to remove the accent or to surround by {} non ASCII
  letters that have multi-byte UTF-8 representations.

- this fix would not work for non latin script (eg. Russian or Japanese)

- the way which I described in my previous email has the advantage that
  the maintenance effort is shared with the LaTeX community, all
  language specific key ordering would just reuse biber, and all the dev
  effort to redesign texinfo/texindex index production would be once and
  for all. So adding any new language would be at almost no effort (as
  far as indexing is concerned) on the Texinfo side.

   V.

PS-1: reacting to Eli

> This assumption is incorrect, because it ignores the possibility that
> the manual is generated as part of building Emacs, in which case the
> locale in which this is done could have nothing to do with French.

You are right, to cover this case one would need the configuration step
to filter out manuals with languages for which there is no installed
locale. With this, the patch I proposed would suffice for people
compiling Emacs from the source to have the manuals in their languages,
assuming that they have their language corresponding locale installed.

Anyway, as the root cause is within Texinfo, fixing it there is the
least overall effort.

PS-2: to Gavin, actually if we forget about correct key sorting and just
want the compilation not to be broken, we could have a simpler fix:

1. pass the real encoding from texi2dvi to texindex through the command
   line (or via some custom envvar)

2. in texindex AWK script, make the @initial{...} starting letter never
   break an UTF-8 byte sequence when the current locale encoding is
   8bit, but the document encoding is UTF-8. What this would require is
   just

   - set some flag to true when current locale encoding is 8bit and
     document encoding is UTF-8
   - if flag false, no change, take the first char into @initial{...}
   - if flag true, replace the first char extraction by some function
     that interpret char as bytes and take 1, 2 or 3 chars depending how
     these bytes fit into a unicode char thus encoded.

My understanding is that the compilation is broken by texindex producing
some @initial{<1st byte of a multibyte char>}.

PS-3: to Gavin again, in the same vein as PS-2, my original patch could
 be improved, still with the only objective to not breaking the
 compilation and accepting bad sorting when the locale is not installed,
 by doing the following:
 - gather <document-locale> from the texinfo document (already in my patch)
 - check if the <document-locale> is installed, if yes proceed as already
   done in my patch
 - otherwise if <current locale> and <document-locale> use the same
   encoding, do not change the locale when calling texindex
 - otherwise if <derived locale> use XX encoding set LC_ALL=C.XX when
   calling texindex.

 This way, the patch does not imply that the <document-locale> is
 installed, but just that C.XX in installed for the encoding of
 <document-locale> which is quite more likely to be true.

________________________________
De : Eli Zaretskii <[email protected]>
Envoyé : vendredi 24 avril 2026 16:41
À : Vincent Belaïche <[email protected]>
Cc : [email protected] <[email protected]>; [email protected] 
<[email protected]>; [email protected] <[email protected]>; [email protected] 
<[email protected]>
Objet : Re: texi2dvi not passing locale to texindex

> From: Vincent Belaïche <[email protected]>
> CC: "[email protected]" <[email protected]>, "[email protected]"
>        <[email protected]>, "[email protected]" <[email protected]>
> Date: Fri, 24 Apr 2026 12:53:15 +0000
>
> Coming back to the patch I am proposing, you are arguing that it is not
> good because it relies on the installed locales. I think that this is
> not a problem of the proposed patch but of the awk based texindex that
> is limited to installed locale. I am expecting that if someone wants to
> compile a document for their own language, they have the corresponding
> locale installed.

This assumption is incorrect, because it ignores the possibility that
the manual is generated as part of building Emacs, in which case the
locale in which this is done could have nothing to do with French.

> So, in a nutshell, the root cause is that texindex is dependant on
> installed locales for sorting, and I would like to react to your
> (Gavin's) comment about rewriting texindex.

I think Gavin already showed how to solve this: by using {..}.  That
doesn't require rewriting texindex.

Reply via email to