Re: [bug-gettext] Plural rule definitions

Michele Locati Tue, 19 May 2015 03:01:13 -0700

2015-05-19 11:12 GMT+02:00 Daiki Ueno <[email protected]>:

> Hi Michele,
>
> Sorry for late response.  I didn't forget it, but was thinking about
> what is the best way to adopt CLDR in gettext.  Currently we are doing:
>
> 0. Mention it in the documentation and guide users to the generated
>    plural rules.  I'll do that really soon, before the next release.
>
> 1. Update plural-table.c, so a new PO file created with msginit will
>    have a usable "Plural-Forms" header.
>
> I think it would be nice if the step 1 is semi-automated somewhere in
> gettext, at least in the release procedure.  In order to that, the diff
> against the previous plural-table.c should be minimal, so that people
> can review the changes easily.  Also, gettext could ship with a helper
> program of msginit (like "urlget"), that retrieves the latest CLDR data
> if plural-table.c doesn't have a definition.
>

That would be great. But we have a problem here: CLDR data defines the
plural rules for integers and for floats.
gettext only works with unsigned integers.
So, the process to translate the plural rules is not so simple.
For example, in Czech the CLDR defines the plural rules "one", "few",
"many", "other": that would lead to nplurals = 4.
(see
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#cs
)
But the "many" category is never used in gettext (because it's only for
floats - v != 0), so we have nplurals = 3
(see http://unicode.org/reports/tr35/tr35-numbers.html#Operands )

That's why I created https://github.com/mlocati/cldr-to-gettext-plural-rules
In the "gettext" branch of that repo I just added a test exporter that can
be used to automatically generate the plural rules for gettext.
IMHO it can be used to statically generate the rules for all the languages
(simply call "bin/export.sh gettext"), so that they can be included
statically in gettext (making the "urlget" approach useless).
But I agree, it's quite big change, and you/other reviewers could have to
spend some time on it.

>
> Michele Locati <[email protected]> writes:
>
> > Yes, that would lead to a more complete languages table. I can easily
> > add a new option to automatically generate the plural_table.
> > BTW, do you think it could be possible to add more infos to that
> > table? I mean, currently gettext offers the number of plurals and the
> > formula to distinguish between them, but the only way to know the
> > meaning of the different plural cases is to inspect the formula.
> > What about adding the CLDR names of the cases and their relative
> > examples? I think that it could help many people if the gettext
> > headers could be extended to something like this:
> > "Language: ar\n"
> > "Plural-Forms: nplurals=6; plural=(n == 0) ? 0 : ((n == 1) ? 1 : ((n
> > == 2) ? 2 : ((n % 100 >= 3 && n % 100 <= 10) ? 3 : ((n % 100 >= 11 &&
> > n % 100 <= 99) ? 4 : 5))));\n"
> > "Plural-Case-0: name=zero; examples=0;\n"
> > "Plural-Case-1: name=one; examples=1;\n"
> > "Plural-Case-2: name=two; examples=2;\n"
> > "Plural-Case-3: name=few; examples=3~10, 103~110, 1003, …;\n"
> > "Plural-Case-4: name=many; examples=11~26, 111, 1011, …;\n"
> > "Plural-Case-5: name=other; examples=100~102, 200~202, 300~302,
> > 400~402, 500~502, 600, 1000, 10000, 100000, 1000000, …;\n"
>
> I think it's a good idea, but the format looks a bit too verbose.
> Perhaps normal comment lines before the header entry might be
> sufficient?  Something like:
>
> # There are 6 different plural forms in this language:
> #
> #   ・0
> #   ・1
> #   ・2
> #   ・3~10, 103~110, 1003, …
> #   ・11~26, 111, 1011, …
> #   ・100~102, 200~202, 300~302, 400~402, 500~502, 600, 1000, 10000,
> #       100000, 1000000, …
> #
> # For more details see <
> http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#ar
> >.
>
> msgid ""
> msgstr ""
> ...
> ""
>

The approach that I proposed was meant to help programs like poedit.
Having a standardized/structured way to represent plural forms (with a
representative name like "one"/"few"/"many" and some example) can be
helpful in such cases...

--
Michele

Re: [bug-gettext] Plural rule definitions

Reply via email to