Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-09 10:49 PM, Mikhail Zabaluev wrote: > I agree. And if expressions are in Rust, you get the benefit of a Rust > compiler validating them. A lambda must produce _some_ string to be > valid; match clauses will be checked for correct type and coverage. > Dynamically interpreted syntax engines do not usually give this benefit > and may in fact let the translator unwittingly and quietly introduce > runtime errors, which are less likely to be caught the farther off the > beaten track the language is (I may be bitter at my EU-market Samsung > TV, that has the Russian language option for the UI, but starts crashing > randomly if you switch to it; no grudge against the Samsung folks on > this list). The sublanguage is non-effectful, non-stateful, non-turing-complete, and has no functions. Evaluation time is linear and harmless if it fails: it can just use the default (non-translated -- wrong language) format if interpretation goes wrong. I will reiterate why I keep objecting to "using rust code" as both the wrong answer and an answer to the wrong question: - Dynamic loading of translations currently happens in the deployment environment. If you require rust code, you're requiring a dynamic load of a .so or .dll rather than reading a .po file for a string. - There are extensive existing toolchains, processes and communities who have no interest in learning to program in rust to (eg.) translate a web browser or consumer product. http://www.poedit.net/screenshots.php https://en.wikipedia.org/wiki/Virtaal http://mozilla.locamotion.org/ http://weblate.org/en/ http://sourceforge.net/projects/translate/ etc. etc. - The whole point of this thread is to _design_ a formatting mini-language. If "plain rust code" was sufficient for this, people would write: let x = do fmt::with_sfmt_writer |f| { f.putstr("there are "); a.fmtD(f); f.putstr(" files in the folder"); }; rather than: let x = fmt!("there are %d files in the folder", a); Yet here we are discussing that format-string mini-language. So all I'm saying is: given that we _are_ discussing the design of a held-in-a-format-string mini-language, why not make sure that design scales nicely to cases when a translator has to express the little bits of logic they often do, such as: "there {num_files, plural, one {is one file} other {are {num_files} files}} in the folder" This is relatively easy to adopt as an extension to {}-based format strings, whereas it's tricker with %s-based. I think this thread keeps going repeatedly off into discussion of problems we are not facing. We're not trying to eliminate format-string mini-languages from rust: we're trying to design one. We're not trying to solve all hypothetical turing-complete translation tasks: we're trying to accommodate the level of translation-variability that normal translators (even people writing non-translated format strings in their home language) run into all the time when composing format strings. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 9 May 2013 12:26, Matthieu Monrocq wrote: > My point is, therefore, that even a seemingly innocent looking sentence > like this one actually turns into a monster: > > "{0} {1, select, singular {{2, select, female {est allée} other {est > allé}}}, other {{2, select, female {sont allées} other {sont allés {3, > select, singular {{4, female {à la} other {au}}} other {aux}} {5}" > > (note: I apologize if the { and } are mismatched... I gave up) > > And, as mentioned, this is French and not Polish, because in Polish the > plural form is declined with special cases depending on the remainder of > the number modulo 10 quite similar to ordinals in English (st, nd, rd vs > th). > In practice, that almost never happens. Most strings have quite specific context and require no conditionals. Rarely, there are conditions for things like "one" vs "1" and so on. Some use-case require extreme flexibility, but at that point they're more likely to be split off into groups, which may differ greatly and have separate source-language (English) strings: in an RPG game there might be a player character 'female' and 'male' version for each string, then perhaps another for each emotion that may be applicable in that case. ICU is in fact one of the more programable translation libraries. Anything more must be handled in cooperation between engineers and translators, so out of scope of a gettext-alike. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
Hi, 2013/5/10 Tim Chevalier > On Thu, May 9, 2013 at 10:49 PM, Mikhail Zabaluev > wrote: > > My favorite real world example is "%s has joined the chat room." The > gender > > may be unknown (they didn't say in their user profile), female, male, > and if > > you are really thorough and provide for non-human chat participants, > > neutral. > > At the risk of being off-topic, many human beings affirm their gender > as neutral or as another gender that isn't male or female. I'm not > interested in starting a lengthy thread on this topic; mainly, I just > want to make sure that a comment that potentially implies that some > people who read this mailing list and/or participate in this project > aren't human doesn't go by unremarked-on. Everyone is welcome to work > on Rust, whether or not they identify within the gender binary. > (Recommended reading: > http://www.sarahmei.com/blog/2010/11/26/disalienation/ and > http://genderqueerid.com/what-is-gq ). > > If anyone wants to discuss this point further, please *reply sender* > and email me privately, rather than replying to the list. > Replying on-list as potentially guilty... Sorry if my comment has caused any offense. In my example, the gender information is intended to be used for grammatical purposes, if provided. For people with more complicated gender than male/female, "neutral" would not be a proper option in this context, as the resulting phrase may sound degrading (like referring to people with the non-personal pronoun "it" in English). So for such cases I suppose it's down to the default "other/unknown", at the disadvantage of translated messages looking form-letterish. Respect, Mikhail ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On Thu, May 9, 2013 at 10:49 PM, Mikhail Zabaluev wrote: > My favorite real world example is "%s has joined the chat room." The gender > may be unknown (they didn't say in their user profile), female, male, and if > you are really thorough and provide for non-human chat participants, > neutral. At the risk of being off-topic, many human beings affirm their gender as neutral or as another gender that isn't male or female. I'm not interested in starting a lengthy thread on this topic; mainly, I just want to make sure that a comment that potentially implies that some people who read this mailing list and/or participate in this project aren't human doesn't go by unremarked-on. Everyone is welcome to work on Rust, whether or not they identify within the gender binary. (Recommended reading: http://www.sarahmei.com/blog/2010/11/26/disalienation/ and http://genderqueerid.com/what-is-gq ). If anyone wants to discuss this point further, please *reply sender* and email me privately, rather than replying to the list. Cheers, Tim -- Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt "Too much to carry, too much to let go Time goes fast, learning goes slow." -- Bruce Cockburn ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
Hi, 2013/5/10 Graydon Hoare > > - Any expression of that conditional logic is going to be ugly, > but it is actually required for the translator to give an > accurate translation. > I agree. And if expressions are in Rust, you get the benefit of a Rust compiler validating them. A lambda must produce _some_ string to be valid; match clauses will be checked for correct type and coverage. Dynamically interpreted syntax engines do not usually give this benefit and may in fact let the translator unwittingly and quietly introduce runtime errors, which are less likely to be caught the farther off the beaten track the language is (I may be bitter at my EU-market Samsung TV, that has the Russian language option for the UI, but starts crashing randomly if you switch to it; no grudge against the Samsung folks on this list). - The odds are that not all those values will be runtime-variable; > the parts that aren't can be directly translated. The switching > is _just_ to defer a decision to runtime based on the provided > substitution value. > > - The important part: you can't ask a translator to express this > "as rust code" because the _locale_ is also a runtime setting; > that is, the translation string is evaluated at runtime > based on whatever-gettext()-returns. The programmer cannot > accommodate the translator's switch-logic because it is neither > static (locale varies at runtime) nor will be it be the same > between locales (logical structure varies with locale). > A translation catalog for a particular locale is supposed to be invoked in that locale, isn't it? But there is a thing, indeed: a translator can get "adventurous" and use more Rust than they are supposed to, up to tweaking with the locale settings (which, if the Rust runtime is any good, should only affect the internal task invoking the translation lambda). This could be solved by compiling the translation catalogs without the prelude and warning on any unusual use statements (e.g. anything outside tr:: utilities, which provide all the selector utilities a translator might need), or auto-providing a "translation prelude" and banning use altogether. That assumes you're talking about a runtime-provided noun being slotted > into a runtime-provided format string. It's of course possible this > could happen, but it's a bit of a corner case within corner cases. The > case I think the gender-selectors are designed for are those where > you're presenting a runtime-variable _person_ in a message (eg. an email > program or such). And you can pass their gender (assuming they want to > use one of the gender-binary words for it) as a value directly to the > formatter. > > A seemingly-good and short-ish slide deck on this is available here. I > recommend reading it: > > > https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000#slide=id.g1bc43a82_2_14 > > Especially the "non-goals". There's a limit. They just want to hit the > majority of cases. "Handle gender - at least for people". > My favorite real world example is "%s has joined the chat room." The gender may be unknown (they didn't say in their user profile), female, male, and if you are really thorough and provide for non-human chat participants, neutral. > It seems to me that given the extraordinary complexity that is lurking > here: > > > > - either you end up with a complicated micro-syntax that you'll have to > > keep buffing up as you discover corner cases in various languages and > > translators keep complaining they cannot do their job. > > I think you're overstating it. This is a problem people have been > struggling with for a long time, but have worked their way towards a > _reasonable_ solution that isn't impossibly complex. There's a > simplified implementation of it here: > > https://github.com/SlexAxton/messageformat.js > This syntax do not appear to me more "translator-friendly" than a restricted and macro-assisted use of Rust. > > - or you just decouple formatting from translation, and provide a > > separate library for translation (outside of core, most probably) > > Layering it might work. I'm not opposed to that. I just thought it worth > looking over the problem space and considering whether it's "too hard" > to support localization from the get-go, and/or whether there'd be any > advantage to combining the design of the two parts. It's pretty > important. We're going to want to localize rustc, and most other things > we write in rust. > I support a separate layer and a macro distinct from fmt!() to invoke it. Plain formatting is used for non-user-visible purposes such as logging or constructing protocol messages, and no translator should have to deal with those format strings picked up by the extractor tool to clutter the catalog. Also, for plain strings, a tr!("foo") looks more logical than a fmt! with no formatting parameters. Best regards, Mikhail
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-09 04:26 AM, Matthieu Monrocq wrote: > However, I am not too sure about the idea of string -> string mapping. > The example you give here is actually slightly more complicated because > there are several orthogonal axes: Hm. I think you're missing what I mean. I mean that the interface -- literally the localization-library-interface we're going to be talking to on a given OS -- takes a string and returns a string. And translations are stored in string->string maps. And edited on websites and with tools that store a translated string as another string. I'm not a translation expert by any means, I'm just trying to reverse-engineer their requirements. And I think you're misunderstanding them. The "translation produces a single string" model is, I think, wired into all the tooling. And more importantly (see below...) > My point is, therefore, that even a seemingly innocent looking sentence > like this one actually turns into a monster: > > "{0} {1, select, singular {{2, select, female {est allée} other {est > allé}}}, other {{2, select, female {sont allées} other {sont allés > {3, select, singular {{4, female {à la} other {au}}} other {aux}} {5}" > > (note: I apologize if the { and } are mismatched... I gave up) Ok, three things to note here: - Any expression of that conditional logic is going to be ugly, but it is actually required for the translator to give an accurate translation. - The odds are that not all those values will be runtime-variable; the parts that aren't can be directly translated. The switching is _just_ to defer a decision to runtime based on the provided substitution value. - The important part: you can't ask a translator to express this "as rust code" because the _locale_ is also a runtime setting; that is, the translation string is evaluated at runtime based on whatever-gettext()-returns. The programmer cannot accommodate the translator's switch-logic because it is neither static (locale varies at runtime) nor will be it be the same between locales (logical structure varies with locale). I am not trying to be obtuse, just figure out why translators have come up with this system and what we need to preserve about it. As far as I can tell, the "balance" between runtime and compile-time variability is the key factor. So any example has to be very careful to reason about which things vary and which are constant. > However, even that example is a bit... too simple. Gender is not > universal, English people talk about "a table" (neutral) whilst French > people talk about "une table" (feminine) and German talk about "der > Tisch" (masculin)... so the programmer cannot indicate whether the word > is feminine or not: it depends on the target language! That assumes you're talking about a runtime-provided noun being slotted into a runtime-provided format string. It's of course possible this could happen, but it's a bit of a corner case within corner cases. The case I think the gender-selectors are designed for are those where you're presenting a runtime-variable _person_ in a message (eg. an email program or such). And you can pass their gender (assuming they want to use one of the gender-binary words for it) as a value directly to the formatter. A seemingly-good and short-ish slide deck on this is available here. I recommend reading it: https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000#slide=id.g1bc43a82_2_14 Especially the "non-goals". There's a limit. They just want to hit the majority of cases. "Handle gender - at least for people". > It seems to me that given the extraordinary complexity that is lurking here: > > - either you end up with a complicated micro-syntax that you'll have to > keep buffing up as you discover corner cases in various languages and > translators keep complaining they cannot do their job. I think you're overstating it. This is a problem people have been struggling with for a long time, but have worked their way towards a _reasonable_ solution that isn't impossibly complex. There's a simplified implementation of it here: https://github.com/SlexAxton/messageformat.js > - or you just decouple formatting from translation, and provide a > separate library for translation (outside of core, most probably) Layering it might work. I'm not opposed to that. I just thought it worth looking over the problem space and considering whether it's "too hard" to support localization from the get-go, and/or whether there'd be any advantage to combining the design of the two parts. It's pretty important. We're going to want to localize rustc, and most other things we write in rust. -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
As for that library, I heavily suggest letting translators manipulate Rust code directly. I see it as no more difficult than asking them to learn a special micro-language that keeps evolving and pattern-matching is really adapted for the task at hand. I'd like to react to this. If the translators have to write Rust code instead of interpretable strings, it most likely means that they now have to set up a full development environment. This alone can be blocking. Ok, let's say that they don't have to do so, because the i18n system is smart enough to dynamically compile and load the rust code written by the translators. There is still another, bigger problem : the code can be wrong. It can be invalid Rust code, or it can contain a bug that will blow the whole application at runtime, or more likely it will be out of sync with the application. With a good i18n system, an ill-formatted or "buggy" translation cannot break the program. The resulting string would be either the raw translation (without interpretation), or it will fall back to the english (original) string. Knowing that a translation cannot break the program is a really nice guarantee ! And knowing that it cannot introduce security risks is great, too ! I said that the translation will likely be out of sync, because this is how translations work. The programmer and the translators must be able to work at their own pace. Let's say a program is translated in 10 languages, and a programmer updates a translated string in the code by adding a new parameter. Now, with a "translations using rust code" system, he has to update the 10 translations correctly, just for the application to compile happily. This is plain impossible. He can also add the parameter and intentionally break the build, and wait for all 10 translators to fix this update in their transaltions. There is clearly a problem, with this solution... The manual of Gettext explains quite well the "continuous" nature of the i18n process : http://www.gnu.org/software/gettext/manual/gettext.html#Overview ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On Thu, May 9, 2013 at 12:11 AM, Graydon Hoare wrote: > On 13-05-07 09:49 AM, Mikhail Zabaluev wrote: > > > What do you think of using Rust lambdas for context-sensitive > > translations? That could easily accommodate any sort of variance, and > > would not complicate the fmt! syntax (though it would require another > > fmt-like macro to substitute, as well as mark, translated messages). > > Creating message catalogs may be more challenging this way, but they > > should be automatically collected and type-bracketed from the source by > > a translation tool, and most of the messages would be plain strings or > > have stock code patterns to fill. > > I think it's relatively important that translations (in message catalog > technology) be just string -> string maps. The delayed / conditional > evaluation part happens dynamically, at runtime. > > That is: the problem isn't one of "what gets expressed in the source", > it's "what gets expressed in the message catalog". If you look at the > examples here: > > > http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types > > and here: > > https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html > > the purpose is to permit a programmer to write something (say, in > English) like: > > fmt!("{0} went to {2}") > > in their code, and have the _translator_ look at that and (say, if > they're doing a French translation) decide it maps to a little miniature > case-statement depending on the arguments: > > "{0} est {1, select, female {allée} other {allé}} à {2}." > > That is a single translation-string. It gets interpreted on the fly by > the formatter, given the current locale. As such, I think it's not > correct to think of this as something done "in rust code". > > (Note: in that example, the condition is based on argument 1, which is > _not even written_ into the target string. It's just used to convey > additional context-information to the format-string evaluator, by > agreement between programmers and translators.) > > -Graydon > ___ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > I agree that their is a need for meta-information that may (or may not) end up being used. Number (for plural terms) and gender probably being the most common here, I've also seen it being used in Clang's diagnostics to fold several "similar" looking messages into a single one with a "variant". However, I am not too sure about the idea of string -> string mapping. The example you give here is actually slightly more complicated because there are several orthogonal axes: - singular/plural of the subject (we're lucky it's simpler in French than Polish) - gender of the subject - singular/plural of the destination: "to the supermarket" = "au supermarché", "to the halles" = "aux halles" - gender of the destination: "to the sea" = "à la mer", "to the supermarket" = "au supermarché", "to the hairdresser" = "chez le coiffeur"/"chez la coiffeuse" [1] => I'll leave this last one aside [1] We could actually express it "au salon de coiffure" but it feels awkward and is rarely used. Still, it fits here. My point is, therefore, that even a seemingly innocent looking sentence like this one actually turns into a monster: "{0} {1, select, singular {{2, select, female {est allée} other {est allé}}}, other {{2, select, female {sont allées} other {sont allés {3, select, singular {{4, female {à la} other {au}}} other {aux}} {5}" (note: I apologize if the { and } are mismatched... I gave up) And, as mentioned, this is French and not Polish, because in Polish the plural form is declined with special cases depending on the remainder of the number modulo 10 quite similar to ordinals in English (st, nd, rd vs th). However, even that example is a bit... too simple. Gender is not universal, English people talk about "a table" (neutral) whilst French people talk about "une table" (feminine) and German talk about "der Tisch" (masculin)... so the programmer cannot indicate whether the word is feminine or not: it depends on the target language! Therefore, a more realistic example would imply that the select is done by looking up the English word in a dictionary for its equivalent in another language and from there adjust the translation depending on the gender of the word in the target language! And of course, the same issue occurs with singular/plural formal, the English "a piece of information" is in French "une information" (singular), whilst the English "information" (non-countable) is in French "les informations" (plural). It seems to me that given the extraordinary complexity that is lurking here: - either you end up with a complicated micro-syntax that you'll have to keep buffing up as you discover corner cases in various languages and translators keep complaining they cannot do their job. - or you just decouple formatting from translation, and provide a
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-07 09:49 AM, Mikhail Zabaluev wrote: > What do you think of using Rust lambdas for context-sensitive > translations? That could easily accommodate any sort of variance, and > would not complicate the fmt! syntax (though it would require another > fmt-like macro to substitute, as well as mark, translated messages). > Creating message catalogs may be more challenging this way, but they > should be automatically collected and type-bracketed from the source by > a translation tool, and most of the messages would be plain strings or > have stock code patterns to fill. I think it's relatively important that translations (in message catalog technology) be just string -> string maps. The delayed / conditional evaluation part happens dynamically, at runtime. That is: the problem isn't one of "what gets expressed in the source", it's "what gets expressed in the message catalog". If you look at the examples here: http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types and here: https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html the purpose is to permit a programmer to write something (say, in English) like: fmt!("{0} went to {2}") in their code, and have the _translator_ look at that and (say, if they're doing a French translation) decide it maps to a little miniature case-statement depending on the arguments: "{0} est {1, select, female {allée} other {allé}} à {2}." That is a single translation-string. It gets interpreted on the fly by the formatter, given the current locale. As such, I think it's not correct to think of this as something done "in rust code". (Note: in that example, the condition is based on argument 1, which is _not even written_ into the target string. It's just used to convey additional context-information to the format-string evaluator, by agreement between programmers and translators.) -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
Hi Graydon, 2013/5/6 Graydon Hoare > > Yes, this is the sort of thing I was thinking of: that there are some > pressures that a gettext() layer feed back to the selection of > formatting strings that might be worth considering. > > Also that it might be nice to make fmt!() default-to, or very easily be > adapted-to (without too much extra noise, say with ifmt!() or such) > invoking the message-catalogue system. The _() macro is used in C I > think due to trying to reduce the noise-effect i18n efforts have on > code. We should keep that in mind. > > > There are other difficulties with localizing formatted messages that are > > never systematically solved, for example, accounting for gender. In all, > > it looks like an interesting area for library research, beyond the basic > > "stick this value pretty-printed into a string" problem. > > There are a few of those, yes. They get quite complex. What do you think of using Rust lambdas for context-sensitive translations? That could easily accommodate any sort of variance, and would not complicate the fmt! syntax (though it would require another fmt-like macro to substitute, as well as mark, translated messages). Creating message catalogs may be more challenging this way, but they should be automatically collected and type-bracketed from the source by a translation tool, and most of the messages would be plain strings or have stock code patterns to fill. Best regards, Mikhail ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-04 12:31 AM, Mikhail Zabaluev wrote: > If you are talking about gettext-like functionality, usually this and > format strings are thought of as independent processing layers: format > strings are translated as such and then fed to the formatting function. > This brings some ramifications, as the order of parameters in the > translated template can change, so the format syntax has to support > positional parameters. But this also allows to account for data-derived > context such as numeral cases, without complicating the printf-like > functions too much. Yes, this is the sort of thing I was thinking of: that there are some pressures that a gettext() layer feed back to the selection of formatting strings that might be worth considering. Also that it might be nice to make fmt!() default-to, or very easily be adapted-to (without too much extra noise, say with ifmt!() or such) invoking the message-catalogue system. The _() macro is used in C I think due to trying to reduce the noise-effect i18n efforts have on code. We should keep that in mind. > There are other difficulties with localizing formatted messages that are > never systematically solved, for example, accounting for gender. In all, > it looks like an interesting area for library research, beyond the basic > "stick this value pretty-printed into a string" problem. There are a few of those, yes. They get quite complex. Though there is some ... "reasonably lightweight" prior art in the ICU format library that I think might be worth pursuing: http://userguide.icu-project.org/formatparse/messages https://ssl.icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 2013-05-04 01:28:43, Huon Wilson wrote: > Hi all, > > Aatch, Kimundi and I (and maybe some others... sorry if I've forgotten > you) came up with a bit of proposal on IRC for handling fmt!. It's > possibly been considered already, but whatever, we'd like some > comments on it. > > > There would one trait for each format specifier (probably excluding > `?'), e.g. FormatC for %c, FormatD for %d/%i, FormatF for %f, and > format would just require that the value for each format specifier > implements the correct trait. (Presumably this check can be done > "automatically" by attempting to call the appropriate method and > using the type checker.) > > In code, > > > trait FormatC { > fn format_c(&self, w: &Writer, flags: Flags); > } > > impl FormatC for char { > fn format_c(&self, w: &Writer, _: Flags) { w.write_char(*self) } > } > > struct MyChar(char); > impl FormatC for MyChar { > fn format_c(&self, w: &Writer, _: Flags) { w.write_char(**self) } > } > > fmt!("%c%c%c", 'a', MyChar('a'), ~"str") > > // becomes > > 'a'.format_c(w, {}); > MyChar('a').format_c(w, {}); > ~"str".format_c(w, {}); > > > And the first two would resolve/type-check fine, but the last would > not. (`Flags' would contain the width and precision specifiers and all > that.) > > > This could then be extended to have a dynamic formatter, which allows > types to format for any specifier at runtime (i.e. get around compile > time restrictions). Our thoughts were to add an extra flag to indicate > this (e.g. !), so that it is entirely and explicitly opt-in. (Similar > to Python's __format__ and Go's fmt (I think).) > > > trait DynamicFormat { > fn format_dynamic(&self, w: &Writer, spec: FormatSpec); > } > > fmt!("%!s %!10.3f", a, b) > > // becomes > > a.format_dynamic(w, {flags: {}, type: 's'}); > w.write_str(" "); > b.format_dynamic(w, {flags: {width: 10, prec: 3}, type: 'f'}); > > > (Presumably this could also have a lint mode, to give an error or > warning if dynamic formatting is used.) > > > There were also some other discussions about the fmt! syntax, e.g. it > was suggested that the following could be equivalent to each other > > fmt!("%{2}[0].[1]f %{2}e", 10, 3, 1.01); > fmt!("%10.3f %e", 1.01, 1.01); > > This is an explicit divergence from printf's slightly archane */'n$' > placeholder syntax. One could use `[*]` to just refer to the next > argument, like * does by default. (Aatch has a format spec parser[1] > in the works that supports this syntax.) > > > Huon > > [1]: https://gist.github.com/Aatch/fb94960ab770c7df5718 > ___ Hi All, Me and dbaupp have done some preliminary implementation[1] on the formatting side of things. During discussion on IRC we have come up with a few extra details that should probably be mentioned. Using a writer for format strings is useful for efficiency, especially when doing things like writing to the terminal or a file. So there are 3 syntax extensions that would be used in order to make this work and be nice: * fmt! which is essentially the same as now, returns a ~str * printf! which writes straight to stdout (effectively replacing `io::print(fmt!(...))`) * writef! which would take an io::Writer as it's first argument fmt! and printf! would simply be written in terms of writef! with pre-supplied Writers. The actual format string has, unsurprisingly, created a lot of discussion mostly around it's relative power. The current placeholder format is as follows: % position flags width precision numeric_arg conversion_specifier With all except the '%' and conversion specifier being optional. The specific format of the fields is detailed in the string parser. Currently we have identified 4 conversion specifiers: 'd', 'f', 's' and '?'. These are interpreted as "convert as" specifiers so '%d' means "convert this argument as a number" and the argument type itself knows how to do this. For flags, we have '0', '-', '=', ' ', '+' and '\'' which have the same meaning as standard printf (where they exist in standard printf). * '0' means zero-pad * '-' means left-justified in the field * '=' means center in the field * ' ' means that a blank should always be before a signed number * '+' means that a '-' or '+' should always be placed before a signed number Width and precision fields are similar to the standard printf fields, just with minor syntax changes in the case of using the next or a specific argument. The numeric arg field is formatted like this: `<[0-9]+>` and is used for supplying a base to 'd' conversions, with the default being 10. This means that '%x', '%o' and '%t' can all be replaced with this format: '%<16>d', '%<8>d' and '%<2>d'. You could obviously specify other bases to print in up to 36. I'm in favor of keeping printf-style strings. For one, they are what we already have, so in that sense it's merely not changing that. Also, I am struggling to see the objective advantages of
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
Hi, 2013/5/3 Graydon Hoare > > (Erm, it might also be worthwhile to consider message catalogues and > locale-facets at this point; the two are closely related. We do not have a > library page on that topic yet, but ought to. Or include it in the lib-fmt > page.) If you are talking about gettext-like functionality, usually this and format strings are thought of as independent processing layers: format strings are translated as such and then fed to the formatting function. This brings some ramifications, as the order of parameters in the translated template can change, so the format syntax has to support positional parameters. But this also allows to account for data-derived context such as numeral cases, without complicating the printf-like functions too much. There are other difficulties with localizing formatted messages that are never systematically solved, for example, accounting for gender. In all, it looks like an interesting area for library research, beyond the basic "stick this value pretty-printed into a string" problem. Cheers, Mikhail ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-03 01:21 PM, Graydon Hoare wrote: On 13-05-03 01:12 PM, Brian Anderson wrote: I agree with reconsidering the inconsistent, underspecified printf syntax, but don't have any specific thoughts on this at this time. Note that I made a page collecting links to existing format libraries a little while back: (Erm, it might also be worthwhile to consider message catalogues and locale-facets at this point; the two are closely related. We do not have a library page on that topic yet, but ought to. Or include it in the lib-fmt page.) -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 13-05-03 01:12 PM, Brian Anderson wrote: I agree with reconsidering the inconsistent, underspecified printf syntax, but don't have any specific thoughts on this at this time. Note that I made a page collecting links to existing format libraries a little while back: https://github.com/mozilla/rust/wiki/Lib-fmt I'm similarly excited to see someone taking charge of this. Thanks! -Graydon ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks
On 05/03/2013 08:28 AM, Huon Wilson wrote: Hi all, Aatch, Kimundi and I (and maybe some others... sorry if I've forgotten you) came up with a bit of proposal on IRC for handling fmt!. It's possibly been considered already, but whatever, we'd like some comments on it. I'm glad you are thinking about this. fmt! is in desperate need of an overhaul, both in design and implementation. There would one trait for each format specifier (probably excluding `?'), e.g. FormatC for %c, FormatD for %d/%i, FormatF for %f, and format would just require that the value for each format specifier implements the correct trait. (Presumably this check can be done "automatically" by attempting to call the appropriate method and using the type checker.) In code, trait FormatC { fn format_c(&self, w: &Writer, flags: Flags); } impl FormatC for char { fn format_c(&self, w: &Writer, _: Flags) { w.write_char(*self) } } struct MyChar(char); impl FormatC for MyChar { fn format_c(&self, w: &Writer, _: Flags) { w.write_char(**self) } } Good call using Writer here. This is one of the crucial changes that must be made. fmt!("%c%c%c", 'a', MyChar('a'), ~"str") // becomes 'a'.format_c(w, {}); MyChar('a').format_c(w, {}); ~"str".format_c(w, {}); And the first two would resolve/type-check fine, but the last would not. (`Flags' would contain the width and precision specifiers and all that.) For these pre-existing format specifiers this would allow arbitrary types to be formatted as i.e. characters. This may be overkill. What we *definitely* need though is for all types that are e.g. signed integers to implement `%i`. `FormatC` I would probably prefer to be `FormatChar`, etc. for clarity. This could then be extended to have a dynamic formatter, which allows types to format for any specifier at runtime (i.e. get around compile time restrictions). Our thoughts were to add an extra flag to indicate this (e.g. !), so that it is entirely and explicitly opt-in. (Similar to Python's __format__ and Go's fmt (I think).) trait DynamicFormat { fn format_dynamic(&self, w: &Writer, spec: FormatSpec); } fmt!("%!s %!10.3f", a, b) // becomes a.format_dynamic(w, {flags: {}, type: 's'}); w.write_str(" "); b.format_dynamic(w, {flags: {width: 10, prec: 3}, type: 'f'}); (Presumably this could also have a lint mode, to give an error or warning if dynamic formatting is used.) I don't understand the use case for this. I would understand (and want) a generic format specifier that defers to some trait, but without specifically using type 's' or 'f'. Something like "%!" that becomes `a.format(w, flags)`, but doesn't try to coerce arbitrary types to print as other arbitrary types. There were also some other discussions about the fmt! syntax, e.g. it was suggested that the following could be equivalent to each other fmt!("%{2}[0].[1]f %{2}e", 10, 3, 1.01); fmt!("%10.3f %e", 1.01, 1.01); This is an explicit divergence from printf's slightly archane */'n$' placeholder syntax. One could use `[*]` to just refer to the next argument, like * does by default. (Aatch has a format spec parser[1] in the works that supports this syntax.) I agree with reconsidering the inconsistent, underspecified printf syntax, but don't have any specific thoughts on this at this time. Nice work. I look forward to seeing where this goes. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev