On Thu, Apr 09, 2026 at 09:36:09PM +0100, Gavin Smith wrote:
> On Wed, Apr 08, 2026 at 10:00:33PM +0200, Patrice Dumas wrote:
> > Upon reading the gettext manual or the POSIX locales support, there is
> > a possible '@modifier'/'@variant' postpended to locale names.
> > 
> > There is no urgency, though, as there are no actual need that has
> > emerged.
> 
> We have a Serbian translation for texinfo.tex (txi-sr.tex) that uses the
> Latin alphabet.  As far as I know, this is not incorrect, as "sr" refers
> to the Serbian language, but doesn't say anything about the alphabet used.
> 
> po/sr.po and po_document/sr.po use the Cyrillic alphabet.

That is an actual need.  Currently it is not possible to have a
consistent output in Serbian regarding the alphabet, the HTML strings will
be in Cyrillic, and the TeX string will be in latin (irrespective of the
actual manual alphabet).

> I feel that in terms of locale names, it is more restrictive, for example
> de_DE as a locale stipulates Latin-1 character encoding (on glibc systems,
> as I understand - I know that locale names aren't universal across all
> operating systems).  However, in a Texinfo document, "@documentlanguage
> de_DE" would merely denote German as spoken in Germany, and say nothing
> about the character encoding.  So I would conclude that language codes passed
> to @documentencoding and locale names are not one and the same.

Agreed.  The encoding part in locales name is of no use in Texinfo, as
we have a separate @documentencoding and @documentlanguage, and we
manage the conversion between encodings ourselves.

> If we allowed such @MODIFIER extensions in the @documentlanguage argument,
> we should be careful about whether we propagated them to other formats
> like HTML, as these may use their own formats for document language.

You are right, I had a look at the web on that and it seems like
BCP 47 is used, which can represent much more than the language, region and
script.

> I think it would be fine to recognize and accept @MODIFIER suffixes as
> you suggest, but it would probably be safer not to do anything with it,
> unless or until we are aware of the practical implications.  This could
> come from users of the Serbian language, for example.

After reading some BCP 47 related information, such as
 https://www.unicode.org/reports/tr35/tr35-76/tr35.html
I am now wondering if we should not have instead a way to specify the
script, directly, like

@documentscript latin

and construct a locale with @modifier for gettext translation retrieval,
but also construct a BCP 47 locale name for HTML, DocBook and possibly other
purposes.

-- 
Pat

Reply via email to