On Thu, May 9, 2013 at 12:11 AM, Graydon Hoare <gray...@mozilla.com> wrote:

> On 13-05-07 09:49 AM, Mikhail Zabaluev wrote:
>
> > What do you think of using Rust lambdas for context-sensitive
> > translations? That could easily accommodate any sort of variance, and
> > would not complicate the fmt! syntax (though it would require another
> > fmt-like macro to substitute, as well as mark, translated messages).
> > Creating message catalogs may be more challenging this way, but they
> > should be automatically collected and type-bracketed from the source by
> > a translation tool, and most of the messages would be plain strings or
> > have stock code patterns to fill.
>
> I think it's relatively important that translations (in message catalog
> technology) be just string -> string maps. The delayed / conditional
> evaluation part happens dynamically, at runtime.
>
> That is: the problem isn't one of "what gets expressed in the source",
> it's "what gets expressed in the message catalog". If you look at the
> examples here:
>
>
> http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types
>
> and here:
>
> https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html
>
> the purpose is to permit a programmer to write something (say, in
> English) like:
>
>   fmt!("{0} went to {2}")
>
> in their code, and have the _translator_ look at that and (say, if
> they're doing a French translation) decide it maps to a little miniature
> case-statement depending on the arguments:
>
>   "{0} est {1, select, female {allée} other {allé}} à {2}."
>
> That is a single translation-string. It gets interpreted on the fly by
> the formatter, given the current locale. As such, I think it's not
> correct to think of this as something done "in rust code".
>
> (Note: in that example, the condition is based on argument 1, which is
> _not even written_ into the target string. It's just used to convey
> additional context-information to the format-string evaluator, by
> agreement between programmers and translators.)
>
> -Graydon
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>

I agree that their is a need for meta-information that may (or may not) end
up being used. Number (for plural terms) and gender probably being the most
common here, I've also seen it being used in Clang's diagnostics to fold
several "similar" looking messages into a single one with a "variant".


However, I am not too sure about the idea of string -> string mapping. The
example you give here is actually slightly more complicated because there
are several orthogonal axes:

 - singular/plural of the subject (we're lucky it's simpler in French than
Polish)
 - gender of the subject
 - singular/plural of the destination: "to the supermarket" = "au
supermarché", "to the halles" = "aux halles"
 - gender of the destination: "to the sea" = "à la mer", "to the
supermarket" = "au supermarché", "to the hairdresser" = "chez le
coiffeur"/"chez la coiffeuse" [1] => I'll leave this last one aside

[1] We could actually express it "au salon de coiffure" but it feels
awkward and is rarely used. Still, it fits here.


My point is, therefore, that even a seemingly innocent looking sentence
like this one actually turns into a monster:

  "{0} {1, select, singular {{2, select, female {est allée} other {est
allé}}}, other {{2, select, female {sont allées} other {sont allés}}}} {3,
select, singular {{4, female {à la} other {au}}} other {aux}} {5}"

  (note: I apologize if the { and } are mismatched... I gave up)

And, as mentioned, this is French and not Polish, because in Polish the
plural form is declined with special cases depending on the remainder of
the number modulo 10 quite similar to ordinals in English (st, nd, rd vs
th).


However, even that example is a bit... too simple. Gender is not universal,
English people talk about "a table" (neutral) whilst French people talk
about "une table" (feminine) and German talk about "der Tisch"
(masculin)... so the programmer cannot indicate whether the word is
feminine or not: it depends on the target language! Therefore, a more
realistic example would imply that the select is done by looking up the
English word in a dictionary for its equivalent in another language and
from there adjust the translation depending on the gender of the word in
the target language! And of course, the same issue occurs with
singular/plural formal, the English "a piece of information" is in French
"une information" (singular), whilst the English "information"
(non-countable) is in French "les informations" (plural).


It seems to me that given the extraordinary complexity that is lurking here:

 - either you end up with a complicated micro-syntax that you'll have to
keep buffing up as you discover corner cases in various languages and
translators keep complaining they cannot do their job.

 - or you just decouple formatting from translation, and provide a separate
library for translation (outside of core, most probably)


As for that library, I heavily suggest letting translators manipulate Rust
code directly. I see it as no more difficult than asking them to learn a
special micro-language that keeps evolving and pattern-matching is really
adapted for the task at hand.


My 2c.

-- Matthieu
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to