On Sun, Apr 12, 2026 at 09:49:17AM +0200, Patrice Dumas wrote:
> So, if scripts are in the @documentlanguage, I would prefer
> 
> @documentlanguage sr_Latn
> 
> and to specify both a script and a variant, which cannot be done, as far
> as I can say, with @VARIANT:
> 
> @documentlanguage sr_Latn_ekavsk

The underscore is already used in Texinfo translations to distinguish
pt and pt_BR.

How are translators going to provide .po files for Serbian ("Ekavian
pronunciation") in the Latin alphabet?  Does it differ significantly
in its written form from non-Ekavian pronunciation?  Is there actually
a demand from users to use or provide such translations?

In short, I propose we wait until specifying such a distinction is a
real problem for users.  I don't feel like we need to provide any more
flexibility than is available with the language names used by gettext.  If
it hasn't been a problem with the gettext facility it is probably not a
real problem, or if it is, it should be fixed with gettext as well.

If Ekavian pronunciation of Serbian was really important to distinguish,
I would expect this could be accommodated in the LL@VARIANT format, e.g.
sr@latin-ekavsk.

> > This directory in the glibc sources appears to show the names of many
> > of the languages used in glibc locales:
> > 
> > https://sourceware.org/cgit/glibc/tree/localedata/locales
> > 
> > Ignoring the @euro variants, it gives an idea how many languages are likely
> > to get translations in multiple alphabets.  The number of languages with
> > user communities who are likely to produce translations is likely smaller
> > still.
> 
> The aforementioned
>  
> https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
> is a better source for that.  For instance, there are the Occitan
> variants there, which are not of that much importance for manuals, but
> make more sense as languages definitions than oc_FR (which is probably
> ok for most aspects of locales).  They can be mapped to variants, though,
> like oc@lengadoc (the occitan variant where I live).

Note that I said languages that are likely to get a translation.  The IANA
subtag registry contains all kind of languages and dialects that are not
likely to be used for formal writing.  I feel that glibc locale names
are more likely to reflect the existence of real user communities.

For example, Scots (sco), which is listed with a variant, Ulster Scots
(sco@ulster).  English is my only language, so I can't speak definitively
about other languages, but Scots as a language would fail to meet several
criteria:

(a) It does not have a standard written form
(b) It does not have a single, standard dialect
(c) It is not widely used in written form for general purposes
(d) There is not any widespread literacy in written forms of the language
(e) The language community is not aware of the existence of such a language.
    In my experience, nobody ever talks about the "Scots" language,
    even if such an entity existed historically.  Occasionally, if
    referring to regional dialects, people might refer to "the Doric"
    (in my part of Scotland) but "Scots" is not used as a name for a
    language.  It's limited to specialists and enthusiasts who are aware
    of such a language from academic writing and historical information.
    (Someone said to me once, "Ah dinnae spik English, ah spik Scottish" -
    unaware that the alleged name of the language spoken was "Scots", not
    "Scottish".)

Simply, Scots in not a "real language" in the way that, say, French is.

(The Scots-language Wikipedia existed for years, unknown to and unused by
real Scottish people.  Apparently many of its articles were written in a
bogus version of Scots that nodody ever spoke in:

> In August 2020, the site attracted attention after a Reddit post noted
> that the project contained an unusually high number of articles written in
> poor-quality Scots. They were written by a single prolific contributor,
> who was an American teenager. These articles consisted of mostly English
> instead of Scots vocabulary and grammar. It is claimed that the editor
> apparently used an online English–Scots dictionary to translate parts
> of English Wikipedia articles word-by-word, without regard for syntax.
> 
> Over 23,000 articles, approximately a third of the entire Scots Wikipedia
> at that time, were created by the editor. These articles have been
> described as "English written in a Scottish accent," with gibberish and
> nonsensical words and spellings not present in any Scots dialect.

https://en.wikipedia.org/wiki/Scots_Wikipedia

This gives a sense of the usage and importance of the Scots language.)

I expect that many other languages or dialects would have a similar
status (to greater or lesser degree).  (I could probably give some much
more controversial examples than Scots.)  Hence it is not necessary
to go to great efforts to support them if they don't fall within the
existing framework.  We don't need to support a way of describing every
dialect or variant given a code by the IANA.





> 
> I think that we should design the @documentlanguage in a way that eases
> specifying any language, even for languages that are not important by
> the number of person speaking that language or without user communities
> likely to produce translations.
> 
> -- 
> Pat

Reply via email to