On Thu, Apr 09, 2026 at 09:36:09PM +0100, Gavin Smith wrote: > On Wed, Apr 08, 2026 at 10:00:33PM +0200, Patrice Dumas wrote: > > Upon reading the gettext manual or the POSIX locales support, there is > > a possible '@modifier'/'@variant' postpended to locale names. > > > > There is no urgency, though, as there are no actual need that has > > emerged. > > We have a Serbian translation for texinfo.tex (txi-sr.tex) that uses the > Latin alphabet. As far as I know, this is not incorrect, as "sr" refers > to the Serbian language, but doesn't say anything about the alphabet used. > > po/sr.po and po_document/sr.po use the Cyrillic alphabet.
That is an actual need. Currently it is not possible to have a consistent output in Serbian regarding the alphabet, the HTML strings will be in Cyrillic, and the TeX string will be in latin (irrespective of the actual manual alphabet). > I feel that in terms of locale names, it is more restrictive, for example > de_DE as a locale stipulates Latin-1 character encoding (on glibc systems, > as I understand - I know that locale names aren't universal across all > operating systems). However, in a Texinfo document, "@documentlanguage > de_DE" would merely denote German as spoken in Germany, and say nothing > about the character encoding. So I would conclude that language codes passed > to @documentencoding and locale names are not one and the same. Agreed. The encoding part in locales name is of no use in Texinfo, as we have a separate @documentencoding and @documentlanguage, and we manage the conversion between encodings ourselves. > If we allowed such @MODIFIER extensions in the @documentlanguage argument, > we should be careful about whether we propagated them to other formats > like HTML, as these may use their own formats for document language. You are right, I had a look at the web on that and it seems like BCP 47 is used, which can represent much more than the language, region and script. > I think it would be fine to recognize and accept @MODIFIER suffixes as > you suggest, but it would probably be safer not to do anything with it, > unless or until we are aware of the practical implications. This could > come from users of the Serbian language, for example. After reading some BCP 47 related information, such as https://www.unicode.org/reports/tr35/tr35-76/tr35.html I am now wondering if we should not have instead a way to specify the script, directly, like @documentscript latin and construct a locale with @modifier for gettext translation retrieval, but also construct a BCP 47 locale name for HTML, DocBook and possibly other purposes. -- Pat
