On Thu, May 9, 2013 at 12:11 AM, Graydon Hoare <gray...@mozilla.com> wrote:
> On 13-05-07 09:49 AM, Mikhail Zabaluev wrote: > > > What do you think of using Rust lambdas for context-sensitive > > translations? That could easily accommodate any sort of variance, and > > would not complicate the fmt! syntax (though it would require another > > fmt-like macro to substitute, as well as mark, translated messages). > > Creating message catalogs may be more challenging this way, but they > > should be automatically collected and type-bracketed from the source by > > a translation tool, and most of the messages would be plain strings or > > have stock code patterns to fill. > > I think it's relatively important that translations (in message catalog > technology) be just string -> string maps. The delayed / conditional > evaluation part happens dynamically, at runtime. > > That is: the problem isn't one of "what gets expressed in the source", > it's "what gets expressed in the message catalog". If you look at the > examples here: > > > http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types > > and here: > > https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html > > the purpose is to permit a programmer to write something (say, in > English) like: > > fmt!("{0} went to {2}") > > in their code, and have the _translator_ look at that and (say, if > they're doing a French translation) decide it maps to a little miniature > case-statement depending on the arguments: > > "{0} est {1, select, female {allée} other {allé}} à {2}." > > That is a single translation-string. It gets interpreted on the fly by > the formatter, given the current locale. As such, I think it's not > correct to think of this as something done "in rust code". > > (Note: in that example, the condition is based on argument 1, which is > _not even written_ into the target string. It's just used to convey > additional context-information to the format-string evaluator, by > agreement between programmers and translators.) > > -Graydon > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > I agree that their is a need for meta-information that may (or may not) end up being used. Number (for plural terms) and gender probably being the most common here, I've also seen it being used in Clang's diagnostics to fold several "similar" looking messages into a single one with a "variant". However, I am not too sure about the idea of string -> string mapping. The example you give here is actually slightly more complicated because there are several orthogonal axes: - singular/plural of the subject (we're lucky it's simpler in French than Polish) - gender of the subject - singular/plural of the destination: "to the supermarket" = "au supermarché", "to the halles" = "aux halles" - gender of the destination: "to the sea" = "à la mer", "to the supermarket" = "au supermarché", "to the hairdresser" = "chez le coiffeur"/"chez la coiffeuse" [1] => I'll leave this last one aside [1] We could actually express it "au salon de coiffure" but it feels awkward and is rarely used. Still, it fits here. My point is, therefore, that even a seemingly innocent looking sentence like this one actually turns into a monster: "{0} {1, select, singular {{2, select, female {est allée} other {est allé}}}, other {{2, select, female {sont allées} other {sont allés}}}} {3, select, singular {{4, female {à la} other {au}}} other {aux}} {5}" (note: I apologize if the { and } are mismatched... I gave up) And, as mentioned, this is French and not Polish, because in Polish the plural form is declined with special cases depending on the remainder of the number modulo 10 quite similar to ordinals in English (st, nd, rd vs th). However, even that example is a bit... too simple. Gender is not universal, English people talk about "a table" (neutral) whilst French people talk about "une table" (feminine) and German talk about "der Tisch" (masculin)... so the programmer cannot indicate whether the word is feminine or not: it depends on the target language! Therefore, a more realistic example would imply that the select is done by looking up the English word in a dictionary for its equivalent in another language and from there adjust the translation depending on the gender of the word in the target language! And of course, the same issue occurs with singular/plural formal, the English "a piece of information" is in French "une information" (singular), whilst the English "information" (non-countable) is in French "les informations" (plural). It seems to me that given the extraordinary complexity that is lurking here: - either you end up with a complicated micro-syntax that you'll have to keep buffing up as you discover corner cases in various languages and translators keep complaining they cannot do their job. - or you just decouple formatting from translation, and provide a separate library for translation (outside of core, most probably) As for that library, I heavily suggest letting translators manipulate Rust code directly. I see it as no more difficult than asking them to learn a special micro-language that keeps evolving and pattern-matching is really adapted for the task at hand. My 2c. -- Matthieu
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev