On Wed, Apr 08, 2026 at 10:00:33PM +0200, Patrice Dumas wrote:
> Hello,
> 
> Upon reading the gettext manual or the POSIX locales support, there is
> a possible '@modifier'/'@variant' postpended to locale names.  The
> examples given in the gettext manual made me think that this would be
> relevant for the internationalizations of manuals, as it can be used for
> script choice for example.  I think that it would therefore make sense
> to accept an '@modifier', like
> 
> @documentlanguage sr@@latin
> 
> There is no urgency, though, as there are no actual need that has
> emerged.  But I think that it would be good to anticipate and allow
> a locale '@modifier'.
> 
> For texi2any, I think that it would be a simple change, only avoiding
> error messages about an incorrect argument, but for Texinfo TeX, I do
> not have any idea of the difficulty.

We have a Serbian translation for texinfo.tex (txi-sr.tex) that uses the
Latin alphabet.  As far as I know, this is not incorrect, as "sr" refers
to the Serbian language, but doesn't say anything about the alphabet used.

po/sr.po and po_document/sr.po use the Cyrillic alphabet.

I feel that in terms of locale names, it is more restrictive, for example
de_DE as a locale stipulates Latin-1 character encoding (on glibc systems,
as I understand - I know that locale names aren't universal across all
operating systems).  However, in a Texinfo document, "@documentlanguage
de_DE" would merely denote German as spoken in Germany, and say nothing
about the character encoding.  So I would conclude that language codes passed
to @documentencoding and locale names are not one and the same.

In Info node "(gettext)Header Entry", it does describe a lanuage code field
for po files, which does seem relevant, but it only mentions "latin" and 
"cyrillic"
as possibilities:

       - ‘LL_CC@VARIANT’, where ‘LL’ is an ISO 639 two-letter or
          three-letter language code (lowercase), ‘CC’ is an ISO 3166
          two-letter country code (uppercase), and ‘VARIANT’ is a
          variant designator.  The variant designator (lowercase) can be
          a script designator, such as ‘latin’ or ‘cyrillic’.

The full generality of possibilities as used in locale names is not
used here:

     • In this PO file field, variant designators that are not
          relevant to message translation, such as ‘@euro’, are not
          used.

Apart from Latin vs. Cyrillic the only other possibility I know about
is Traditional vs. Simplified Chinese Characters.

I could not find much more about it.  The glibc manual mentions
@MODIFIER as a possible component in a language given in the LANGUAGE
environment variable, but doesn't say anything about what the values
of "MODIFIER" could be.

https://sourceware.org/glibc/manual/2.43/html_node/Using-gettextized-software.html

If we allowed such @MODIFIER extensions in the @documentlanguage argument,
we should be careful about whether we propagated them to other formats
like HTML, as these may use their own formats for document language.

I think it would be fine to recognize and accept @MODIFIER suffixes as
you suggest, but it would probably be safer not to do anything with it,
unless or until we are aware of the practical implications.  This could
come from users of the Serbian language, for example.

I could probably figure out how to strip off a @MODIFIER extension in
texinfo.tex eventually.


Reply via email to