On Thu, Apr 16, 2026 at 08:24:17PM +0100, Gavin Smith wrote: > As I understand, the IETF language subtag registry defines which "variant" > subtags > may be used in combination with other "variant" subtags. > > For example, the "grclass" subtag you used as an example is more > specific than variants of Occitan and occurs later in the BCP 47 language > identifier. This is shown by the "Prefix:" entries in the register: > > Type: variant > Subtag: grclass > Description: Classical Occitan orthography > Added: 2018-04-22 > Prefix: oc > Prefix: oc-aranes > Prefix: oc-auvern > Prefix: oc-cisaup > Prefix: oc-creiss > Prefix: oc-gascon > Prefix: oc-lemosin > Prefix: oc-lengadoc > Prefix: oc-nicard > Prefix: oc-provenc > Prefix: oc-vivaraup > Comments: Classical written standard for Occitan developed in 1935 by > Alibèrt > > "oc-legnadoc-grclass" would be denoted in a Texinfo input file thus: > > @documentlanguage oc > @documentlanguagevariant lengadoc, grclass > > The other order, "@documentlanguagevariant grclass, lengadoc" would have > to be considered incorrect, as "lengadoc" is only allowed under the "oc-" > prefix, not under "oc-grclass-": > > Type: variant > Subtag: lengadoc > Description: Languedocien > Added: 2018-04-22 > Prefix: oc > Comments: Occitan variant spoken in Languedoc > > - (although it's uncertain whether texi2any should do the work to validate > such usages).
I had understood that, and the examples I have seen in the IANA registry made me think that it was well done, but I think that texi2any should not validate the variants order and association to @documentlanguage. Maybe later on, but to me this looks like unneeded complexity. Users using variants will not be numerous and can be supposed to know what they are doing. > I prefer @documentlanguagevariant to @documentlanguagevariants, as it > is fine in the singular with a list, whereas in the plural with a single > item in the argument would look strange. Ok, this is what I used. > If I understand correctly, the argument to @documentlanguagevariant > only accepts tags that are entered as "Type: variant" in the IANA > registry. We do not accept region codes, scripts, or the "extension" > subtags of BCP 47. > Notably, we do not accept "-u-sd-" extensions > used to denote regional variants. (Wikipedia gives "gsw-u-sd-chzh" > as an example denoting "Swiss German as used in the Canton of Zurich".) I agree that the u extensions should not be used for languages. They could be of use for other purposes, for example collation customization. But this is a refinement that is not needed right now and may not be interesting ever -- unless we get users reports. Having language specific collation is good enough, in my opinion. > (This -u- extension is governed by yet another entity, the Unicode > Consortium. There is information on it here: > https://www.unicode.org/reports/tr35/#u_Extension.) > > So we can reference the IANA subtag registry in a simple way, without > accepting the full complexity and scope of possibilities of BCP 47. That was always my intention, by focusing on the main language, region, script and variants parts. -- Pat
