Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-10 Thread Graydon Hoare
On 13-05-09 10:49 PM, Mikhail Zabaluev wrote:

> I agree. And if expressions are in Rust, you get the benefit of a Rust
> compiler validating them. A lambda must produce _some_ string to be
> valid; match clauses will be checked for correct type and coverage.
> Dynamically interpreted syntax engines do not usually give this benefit
> and may in fact let the translator unwittingly and quietly introduce
> runtime errors, which are less likely to be caught the farther off the
> beaten track the language is (I may be bitter at my EU-market Samsung
> TV, that has the Russian language option for the UI, but starts crashing
> randomly if you switch to it; no grudge against the Samsung folks on
> this list).

The sublanguage is non-effectful, non-stateful, non-turing-complete, and
has no functions. Evaluation time is linear and harmless if it fails: it
can just use the default (non-translated -- wrong language) format if
interpretation goes wrong.

I will reiterate why I keep objecting to "using rust code" as both the
wrong answer and an answer to the wrong question:

  - Dynamic loading of translations currently happens in the deployment
environment. If you require rust code, you're requiring a dynamic
load of a .so or .dll rather than reading a .po file for a string.

  - There are extensive existing toolchains, processes and communities
who have no interest in learning to program in rust to (eg.)
translate a web browser or consumer product.

http://www.poedit.net/screenshots.php
https://en.wikipedia.org/wiki/Virtaal
http://mozilla.locamotion.org/
http://weblate.org/en/
http://sourceforge.net/projects/translate/

etc. etc.

  - The whole point of this thread is to _design_ a formatting
mini-language. If "plain rust code" was sufficient for this,
people would write:

let x = do fmt::with_sfmt_writer |f| {
f.putstr("there are ");
a.fmtD(f);
f.putstr(" files in the folder");
};

rather than:

let x = fmt!("there are %d files in the folder", a);

Yet here we are discussing that format-string mini-language.
So all I'm saying is: given that we _are_ discussing the design
of a held-in-a-format-string mini-language, why not make sure
that design scales nicely to cases when a translator has to
express the little bits of logic they often do, such as:

"there {num_files, plural,
one {is one file}
other {are {num_files} files}} in the folder"

This is relatively easy to adopt as an extension to {}-based
format strings, whereas it's tricker with %s-based.

I think this thread keeps going repeatedly off into discussion of
problems we are not facing. We're not trying to eliminate format-string
mini-languages from rust: we're trying to design one. We're not trying
to solve all hypothetical turing-complete translation tasks: we're
trying to accommodate the level of translation-variability that normal
translators (even people writing non-translated format strings in their
home language) run into all the time when composing format strings.

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-10 Thread Lucian Branescu
On 9 May 2013 12:26, Matthieu Monrocq  wrote:

> My point is, therefore, that even a seemingly innocent looking sentence
> like this one actually turns into a monster:
>
>   "{0} {1, select, singular {{2, select, female {est allée} other {est
> allé}}}, other {{2, select, female {sont allées} other {sont allés {3,
> select, singular {{4, female {à la} other {au}}} other {aux}} {5}"
>
>   (note: I apologize if the { and } are mismatched... I gave up)
>
> And, as mentioned, this is French and not Polish, because in Polish the
> plural form is declined with special cases depending on the remainder of
> the number modulo 10 quite similar to ordinals in English (st, nd, rd vs
> th).
>

In practice, that almost never happens. Most strings have quite specific
context and require no conditionals. Rarely, there are conditions for
things like "one" vs "1" and so on.

Some use-case require extreme flexibility, but at that point they're more
likely to be split off into groups, which may differ greatly and have
separate source-language (English) strings: in an RPG game there might be a
player character 'female' and 'male' version for each string, then perhaps
another for each emotion that may be applicable in that case.

ICU is in fact one of the more programable translation libraries. Anything
more must be handled in cooperation between engineers and translators, so
out of scope of a gettext-alike.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-10 Thread Mikhail Zabaluev
Hi,

2013/5/10 Tim Chevalier 

> On Thu, May 9, 2013 at 10:49 PM, Mikhail Zabaluev
>  wrote:
> > My favorite real world example is "%s has joined the chat room." The
> gender
> > may be unknown (they didn't say in their user profile), female, male,
> and if
> > you are really thorough and provide for non-human chat participants,
> > neutral.
>
> At the risk of being off-topic, many human beings affirm their gender
> as neutral or as another gender that isn't male or female. I'm not
> interested in starting a lengthy thread on this topic; mainly, I just
> want to make sure that a comment that potentially implies that some
> people who read this mailing list and/or participate in this project
> aren't human doesn't go by unremarked-on. Everyone is welcome to work
> on Rust, whether or not they identify within the gender binary.
> (Recommended reading:
> http://www.sarahmei.com/blog/2010/11/26/disalienation/ and
> http://genderqueerid.com/what-is-gq ).
>
> If anyone wants to discuss this point further, please *reply sender*
> and email me privately, rather than replying to the list.
>

Replying on-list as potentially guilty... Sorry if my comment has caused
any offense.
In my example, the gender information is intended to be used for
grammatical purposes, if provided.
For people with more complicated gender than male/female, "neutral" would
not be a proper option in this context, as the resulting phrase may sound
degrading (like referring to people with the non-personal pronoun "it" in
English).
So for such cases I suppose it's down to the default "other/unknown", at
the disadvantage of translated messages looking form-letterish.

Respect,
  Mikhail
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-09 Thread Tim Chevalier
On Thu, May 9, 2013 at 10:49 PM, Mikhail Zabaluev
 wrote:
> My favorite real world example is "%s has joined the chat room." The gender
> may be unknown (they didn't say in their user profile), female, male, and if
> you are really thorough and provide for non-human chat participants,
> neutral.

At the risk of being off-topic, many human beings affirm their gender
as neutral or as another gender that isn't male or female. I'm not
interested in starting a lengthy thread on this topic; mainly, I just
want to make sure that a comment that potentially implies that some
people who read this mailing list and/or participate in this project
aren't human doesn't go by unremarked-on. Everyone is welcome to work
on Rust, whether or not they identify within the gender binary.
(Recommended reading:
http://www.sarahmei.com/blog/2010/11/26/disalienation/ and
http://genderqueerid.com/what-is-gq ).

If anyone wants to discuss this point further, please *reply sender*
and email me privately, rather than replying to the list.

Cheers,
Tim

-- 
Tim Chevalier * http://catamorphism.org/ * Often in error, never in doubt
"Too much to carry, too much to let go
Time goes fast, learning goes slow." -- Bruce Cockburn
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-09 Thread Mikhail Zabaluev
Hi,

2013/5/10 Graydon Hoare 

>
>   - Any expression of that conditional logic is going to be ugly,
> but it is actually required for the translator to give an
> accurate translation.
>

I agree. And if expressions are in Rust, you get the benefit of a Rust
compiler validating them. A lambda must produce _some_ string to be valid;
match clauses will be checked for correct type and coverage. Dynamically
interpreted syntax engines do not usually give this benefit and may in fact
let the translator unwittingly and quietly introduce runtime errors, which
are less likely to be caught the farther off the beaten track the language
is (I may be bitter at my EU-market Samsung TV, that has the Russian
language option for the UI, but starts crashing randomly if you switch to
it; no grudge against the Samsung folks on this list).

  - The odds are that not all those values will be runtime-variable;
> the parts that aren't can be directly translated. The switching
> is _just_ to defer a decision to runtime based on the provided
> substitution value.
>
>   - The important part: you can't ask a translator to express this
> "as rust code" because the _locale_ is also a runtime setting;
> that is, the translation string is evaluated at runtime
> based on whatever-gettext()-returns. The programmer cannot
> accommodate the translator's switch-logic because it is neither
> static (locale varies at runtime) nor will be it be the same
> between locales (logical structure varies with locale).
>

A translation catalog for a particular locale is supposed to be invoked in
that locale, isn't it?
But there is a thing, indeed: a translator can get "adventurous" and use
more Rust than they are supposed to, up to tweaking with the locale
settings (which, if the Rust runtime is any good, should only affect the
internal task invoking the translation lambda). This could be solved by
compiling the translation catalogs without the prelude and warning on any
unusual use statements (e.g. anything outside tr:: utilities, which provide
all the selector utilities a translator might need), or auto-providing a
"translation prelude" and banning use altogether.

That assumes you're talking about a runtime-provided noun being slotted
> into a runtime-provided format string. It's of course possible this
> could happen, but it's a bit of a corner case within corner cases. The
> case I think the gender-selectors are designed for are those where
> you're presenting a runtime-variable _person_ in a message (eg. an email
> program or such). And you can pass their gender (assuming they want to
> use one of the gender-binary words for it) as a value directly to the
> formatter.
>
> A seemingly-good and short-ish slide deck on this is available here. I
> recommend reading it:
>
>
> https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000#slide=id.g1bc43a82_2_14
>
> Especially the "non-goals". There's a limit. They just want to hit the
> majority of cases. "Handle gender - at least for people".
>

My favorite real world example is "%s has joined the chat room." The gender
may be unknown (they didn't say in their user profile), female, male, and
if you are really thorough and provide for non-human chat participants,
neutral.

 > It seems to me that given the extraordinary complexity that is lurking
> here:
> >
> >  - either you end up with a complicated micro-syntax that you'll have to
> > keep buffing up as you discover corner cases in various languages and
> > translators keep complaining they cannot do their job.
>
> I think you're overstating it. This is a problem people have been
> struggling with for a long time, but have worked their way towards a
> _reasonable_ solution that isn't impossibly complex. There's a
> simplified implementation of it here:
>
> https://github.com/SlexAxton/messageformat.js
>

This syntax do not appear to me more "translator-friendly" than a
restricted and macro-assisted use of Rust.


>  >  - or you just decouple formatting from translation, and provide a
> > separate library for translation (outside of core, most probably)
>
> Layering it might work. I'm not opposed to that. I just thought it worth
> looking over the problem space and considering whether it's "too hard"
> to support localization from the get-go, and/or whether there'd be any
> advantage to combining the design of the two parts. It's pretty
> important. We're going to want to localize rustc, and most other things
> we write in rust.
>

I support a separate layer and a macro distinct from fmt!() to invoke it.
Plain formatting is used for non-user-visible purposes such as logging or
constructing protocol messages, and no translator should have to deal with
those format strings picked up by the extractor tool to clutter the catalog.
Also, for plain strings, a tr!("foo") looks more logical than a fmt! with
no formatting parameters.

Best regards,
  Mikhail

Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-09 Thread Graydon Hoare
On 13-05-09 04:26 AM, Matthieu Monrocq wrote:

> However, I am not too sure about the idea of string -> string mapping.
> The example you give here is actually slightly more complicated because
> there are several orthogonal axes:

Hm. I think you're missing what I mean. I mean that the interface --
literally the localization-library-interface we're going to be talking
to on a given OS -- takes a string and returns a string. And
translations are stored in string->string maps. And edited on websites
and with tools that store a translated string as another string.

I'm not a translation expert by any means, I'm just trying to
reverse-engineer their requirements. And I think you're misunderstanding
them. The "translation produces a single string" model is, I think,
wired into all the tooling.

And more importantly (see below...)

> My point is, therefore, that even a seemingly innocent looking sentence
> like this one actually turns into a monster:
> 
>   "{0} {1, select, singular {{2, select, female {est allée} other {est
> allé}}}, other {{2, select, female {sont allées} other {sont allés
> {3, select, singular {{4, female {à la} other {au}}} other {aux}} {5}"
> 
>   (note: I apologize if the { and } are mismatched... I gave up)

Ok, three things to note here:

  - Any expression of that conditional logic is going to be ugly,
but it is actually required for the translator to give an
accurate translation.

  - The odds are that not all those values will be runtime-variable;
the parts that aren't can be directly translated. The switching
is _just_ to defer a decision to runtime based on the provided
substitution value.

  - The important part: you can't ask a translator to express this
"as rust code" because the _locale_ is also a runtime setting;
that is, the translation string is evaluated at runtime
based on whatever-gettext()-returns. The programmer cannot
accommodate the translator's switch-logic because it is neither
static (locale varies at runtime) nor will be it be the same
between locales (logical structure varies with locale).

I am not trying to be obtuse, just figure out why translators have come
up with this system and what we need to preserve about it. As far as I
can tell, the "balance" between runtime and compile-time variability is
the key factor. So any example has to be very careful to reason about
which things vary and which are constant.

> However, even that example is a bit... too simple. Gender is not
> universal, English people talk about "a table" (neutral) whilst French
> people talk about "une table" (feminine) and German talk about "der
> Tisch" (masculin)... so the programmer cannot indicate whether the word
> is feminine or not: it depends on the target language!

That assumes you're talking about a runtime-provided noun being slotted
into a runtime-provided format string. It's of course possible this
could happen, but it's a bit of a corner case within corner cases. The
case I think the gender-selectors are designed for are those where
you're presenting a runtime-variable _person_ in a message (eg. an email
program or such). And you can pass their gender (assuming they want to
use one of the gender-binary words for it) as a value directly to the
formatter.

A seemingly-good and short-ish slide deck on this is available here. I
recommend reading it:

https://docs.google.com/presentation/d/1ZyN8-0VXmod5hbHveq-M1AeQ61Ga3BmVuahZjbmbBxo/pub?start=false&loop=false&delayms=3000#slide=id.g1bc43a82_2_14

Especially the "non-goals". There's a limit. They just want to hit the
majority of cases. "Handle gender - at least for people".

> It seems to me that given the extraordinary complexity that is lurking here:
> 
>  - either you end up with a complicated micro-syntax that you'll have to
> keep buffing up as you discover corner cases in various languages and
> translators keep complaining they cannot do their job.

I think you're overstating it. This is a problem people have been
struggling with for a long time, but have worked their way towards a
_reasonable_ solution that isn't impossibly complex. There's a
simplified implementation of it here:

https://github.com/SlexAxton/messageformat.js

>  - or you just decouple formatting from translation, and provide a
> separate library for translation (outside of core, most probably)

Layering it might work. I'm not opposed to that. I just thought it worth
looking over the problem space and considering whether it's "too hard"
to support localization from the get-go, and/or whether there'd be any
advantage to combining the design of the two parts. It's pretty
important. We're going to want to localize rustc, and most other things
we write in rust.

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-09 Thread Olivier Renaud
As for that library, I heavily suggest letting translators manipulate Rust code 
directly. I see it as no more difficult than asking them to learn a special 
micro-language that keeps evolving and pattern-matching is really adapted for 
the task at hand.
I'd like to react to this.
If the translators have to write Rust code instead of interpretable strings, it 
most likely means that they now have to set up a full development environment. 
This alone can be blocking.
Ok, let's say that they don't have to do so, because the i18n system is smart 
enough to dynamically compile and load the rust code written by the 
translators. There is still another, bigger problem : the code can be wrong. It 
can be invalid Rust code, or it can contain a bug that will blow the whole 
application at runtime, or more likely it will be out of sync with the 
application.
With a good i18n system, an ill-formatted or "buggy" translation cannot break 
the program. The resulting string would be either the raw translation (without 
interpretation), or it will fall back to the english (original) string. Knowing 
that a translation cannot break the program is a really nice guarantee ! And 
knowing that it cannot introduce security risks is great, too !
I said that the translation will likely be out of sync, because this is how 
translations work. The programmer and the translators must be able to work at 
their own pace. Let's say a program is translated in 10 languages, and a 
programmer updates a translated string in the code by adding a new parameter. 
Now, with a "translations using rust code" system, he has to update the 10 
translations correctly, just for the application to compile happily. This is 
plain impossible. He can also add the parameter and intentionally break the 
build, and wait for all 10 translators to fix this update in their 
transaltions. There is clearly a problem, with this solution...
The manual of Gettext explains quite well the "continuous" nature of the i18n 
process : http://www.gnu.org/software/gettext/manual/gettext.html#Overview
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-09 Thread Matthieu Monrocq
On Thu, May 9, 2013 at 12:11 AM, Graydon Hoare  wrote:

> On 13-05-07 09:49 AM, Mikhail Zabaluev wrote:
>
> > What do you think of using Rust lambdas for context-sensitive
> > translations? That could easily accommodate any sort of variance, and
> > would not complicate the fmt! syntax (though it would require another
> > fmt-like macro to substitute, as well as mark, translated messages).
> > Creating message catalogs may be more challenging this way, but they
> > should be automatically collected and type-bracketed from the source by
> > a translation tool, and most of the messages would be plain strings or
> > have stock code patterns to fill.
>
> I think it's relatively important that translations (in message catalog
> technology) be just string -> string maps. The delayed / conditional
> evaluation part happens dynamically, at runtime.
>
> That is: the problem isn't one of "what gets expressed in the source",
> it's "what gets expressed in the message catalog". If you look at the
> examples here:
>
>
> http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types
>
> and here:
>
> https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html
>
> the purpose is to permit a programmer to write something (say, in
> English) like:
>
>   fmt!("{0} went to {2}")
>
> in their code, and have the _translator_ look at that and (say, if
> they're doing a French translation) decide it maps to a little miniature
> case-statement depending on the arguments:
>
>   "{0} est {1, select, female {allée} other {allé}} à {2}."
>
> That is a single translation-string. It gets interpreted on the fly by
> the formatter, given the current locale. As such, I think it's not
> correct to think of this as something done "in rust code".
>
> (Note: in that example, the condition is based on argument 1, which is
> _not even written_ into the target string. It's just used to convey
> additional context-information to the format-string evaluator, by
> agreement between programmers and translators.)
>
> -Graydon
> ___
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>

I agree that their is a need for meta-information that may (or may not) end
up being used. Number (for plural terms) and gender probably being the most
common here, I've also seen it being used in Clang's diagnostics to fold
several "similar" looking messages into a single one with a "variant".


However, I am not too sure about the idea of string -> string mapping. The
example you give here is actually slightly more complicated because there
are several orthogonal axes:

 - singular/plural of the subject (we're lucky it's simpler in French than
Polish)
 - gender of the subject
 - singular/plural of the destination: "to the supermarket" = "au
supermarché", "to the halles" = "aux halles"
 - gender of the destination: "to the sea" = "à la mer", "to the
supermarket" = "au supermarché", "to the hairdresser" = "chez le
coiffeur"/"chez la coiffeuse" [1] => I'll leave this last one aside

[1] We could actually express it "au salon de coiffure" but it feels
awkward and is rarely used. Still, it fits here.


My point is, therefore, that even a seemingly innocent looking sentence
like this one actually turns into a monster:

  "{0} {1, select, singular {{2, select, female {est allée} other {est
allé}}}, other {{2, select, female {sont allées} other {sont allés {3,
select, singular {{4, female {à la} other {au}}} other {aux}} {5}"

  (note: I apologize if the { and } are mismatched... I gave up)

And, as mentioned, this is French and not Polish, because in Polish the
plural form is declined with special cases depending on the remainder of
the number modulo 10 quite similar to ordinals in English (st, nd, rd vs
th).


However, even that example is a bit... too simple. Gender is not universal,
English people talk about "a table" (neutral) whilst French people talk
about "une table" (feminine) and German talk about "der Tisch"
(masculin)... so the programmer cannot indicate whether the word is
feminine or not: it depends on the target language! Therefore, a more
realistic example would imply that the select is done by looking up the
English word in a dictionary for its equivalent in another language and
from there adjust the translation depending on the gender of the word in
the target language! And of course, the same issue occurs with
singular/plural formal, the English "a piece of information" is in French
"une information" (singular), whilst the English "information"
(non-countable) is in French "les informations" (plural).


It seems to me that given the extraordinary complexity that is lurking here:

 - either you end up with a complicated micro-syntax that you'll have to
keep buffing up as you discover corner cases in various languages and
translators keep complaining they cannot do their job.

 - or you just decouple formatting from translation, and provide a 

Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-08 Thread Graydon Hoare
On 13-05-07 09:49 AM, Mikhail Zabaluev wrote:

> What do you think of using Rust lambdas for context-sensitive
> translations? That could easily accommodate any sort of variance, and
> would not complicate the fmt! syntax (though it would require another
> fmt-like macro to substitute, as well as mark, translated messages).
> Creating message catalogs may be more challenging this way, but they
> should be automatically collected and type-bracketed from the source by
> a translation tool, and most of the messages would be plain strings or
> have stock code patterns to fill.

I think it's relatively important that translations (in message catalog
technology) be just string -> string maps. The delayed / conditional
evaluation part happens dynamically, at runtime.

That is: the problem isn't one of "what gets expressed in the source",
it's "what gets expressed in the message catalog". If you look at the
examples here:

http://userguide.icu-project.org/formatparse/messages#TOC-Complex-Argument-Types

and here:

https://ssl.icu-project.org/apiref/icu4c/classicu_1_1SelectFormat.html

the purpose is to permit a programmer to write something (say, in
English) like:

  fmt!("{0} went to {2}")

in their code, and have the _translator_ look at that and (say, if
they're doing a French translation) decide it maps to a little miniature
case-statement depending on the arguments:

  "{0} est {1, select, female {allée} other {allé}} à {2}."

That is a single translation-string. It gets interpreted on the fly by
the formatter, given the current locale. As such, I think it's not
correct to think of this as something done "in rust code".

(Note: in that example, the condition is based on argument 1, which is
_not even written_ into the target string. It's just used to convey
additional context-information to the format-string evaluator, by
agreement between programmers and translators.)

-Graydon
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-07 Thread Mikhail Zabaluev
Hi Graydon,

2013/5/6 Graydon Hoare 

>
> Yes, this is the sort of thing I was thinking of: that there are some
> pressures that a gettext() layer feed back to the selection of
> formatting strings that might be worth considering.
>
> Also that it might be nice to make fmt!() default-to, or very easily be
> adapted-to (without too much extra noise, say with ifmt!() or such)
> invoking the message-catalogue system. The _() macro is used in C I
> think due to trying to reduce the noise-effect i18n efforts have on
> code. We should keep that in mind.
>
> > There are other difficulties with localizing formatted messages that are
> > never systematically solved, for example, accounting for gender. In all,
> > it looks like an interesting area for library research, beyond the basic
> > "stick this value pretty-printed into a string" problem.
>
> There are a few of those, yes. They get quite complex.


What do you think of using Rust lambdas for context-sensitive translations?
That could easily accommodate any sort of variance, and would not
complicate the fmt! syntax (though it would require another fmt-like macro
to substitute, as well as mark, translated messages). Creating message
catalogs may be more challenging this way, but they should be automatically
collected and type-bracketed from the source by a translation tool, and
most of the messages would be plain strings or have stock code patterns to
fill.

Best regards,
  Mikhail
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-06 Thread Graydon Hoare
On 13-05-04 12:31 AM, Mikhail Zabaluev wrote:

> If you are talking about gettext-like functionality, usually this and
> format strings are thought of as independent processing layers: format
> strings are translated as such and then fed to the formatting function.
> This brings some ramifications, as the order of parameters in the
> translated template can change, so the format syntax has to support
> positional parameters. But this also allows to account for data-derived
> context such as numeral cases, without complicating the printf-like
> functions too much.

Yes, this is the sort of thing I was thinking of: that there are some
pressures that a gettext() layer feed back to the selection of
formatting strings that might be worth considering.

Also that it might be nice to make fmt!() default-to, or very easily be
adapted-to (without too much extra noise, say with ifmt!() or such)
invoking the message-catalogue system. The _() macro is used in C I
think due to trying to reduce the noise-effect i18n efforts have on
code. We should keep that in mind.

> There are other difficulties with localizing formatted messages that are
> never systematically solved, for example, accounting for gender. In all,
> it looks like an interesting area for library research, beyond the basic
> "stick this value pretty-printed into a string" problem.

There are a few of those, yes. They get quite complex. Though there is
some ... "reasonably lightweight" prior art in the ICU format library
that I think might be worth pursuing:

http://userguide.icu-project.org/formatparse/messages
https://ssl.icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-04 Thread James Miller
On 2013-05-04 01:28:43, Huon Wilson wrote:
> Hi all,
> 
> Aatch, Kimundi and I (and maybe some others... sorry if I've forgotten
> you) came up with a bit of proposal on IRC for handling fmt!. It's
> possibly been considered already, but whatever, we'd like some
> comments on it.
> 
> 
> There would one trait for each format specifier (probably excluding
> `?'), e.g. FormatC for %c, FormatD for %d/%i, FormatF for %f, and
> format would just require that the value for each format specifier
> implements the correct trait. (Presumably this check can be done
> "automatically" by attempting to call the appropriate method and
> using the type checker.)
> 
> In code,
> 
> 
> trait FormatC {
>   fn format_c(&self, w: &Writer, flags: Flags);
> }
> 
> impl FormatC for char {
>   fn format_c(&self, w: &Writer, _: Flags) { w.write_char(*self) }
> }
> 
> struct MyChar(char);
> impl FormatC for MyChar {
>   fn format_c(&self, w: &Writer, _: Flags) { w.write_char(**self) }
> }
> 
> fmt!("%c%c%c", 'a', MyChar('a'), ~"str")
> 
> // becomes
> 
> 'a'.format_c(w, {});
> MyChar('a').format_c(w, {});
> ~"str".format_c(w, {});
> 
> 
> And the first two would resolve/type-check fine, but the last would
> not. (`Flags' would contain the width and precision specifiers and all
> that.)
> 
> 
> This could then be extended to have a dynamic formatter, which allows
> types to format for any specifier at runtime (i.e. get around compile
> time restrictions). Our thoughts were to add an extra flag to indicate
> this (e.g. !), so that it is entirely and explicitly opt-in. (Similar
> to Python's __format__ and Go's fmt (I think).)
> 
> 
> trait DynamicFormat {
>   fn format_dynamic(&self, w: &Writer, spec: FormatSpec);
> }
> 
> fmt!("%!s %!10.3f", a, b)
> 
> // becomes
> 
> a.format_dynamic(w, {flags: {}, type: 's'});
> w.write_str(" ");
> b.format_dynamic(w, {flags: {width: 10, prec: 3}, type: 'f'});
> 
> 
> (Presumably this could also have a lint mode, to give an error or
> warning if dynamic formatting is used.)
> 
> 
> There were also some other discussions about the fmt! syntax, e.g. it
> was suggested that the following could be equivalent to each other
> 
> fmt!("%{2}[0].[1]f %{2}e", 10, 3, 1.01);
> fmt!("%10.3f %e", 1.01, 1.01);
> 
> This is an explicit divergence from printf's slightly archane */'n$'
> placeholder syntax. One could use `[*]`  to just refer to the next
> argument, like * does by default. (Aatch has a format spec parser[1]
> in the works that supports this syntax.)
> 
> 
> Huon
> 
> [1]: https://gist.github.com/Aatch/fb94960ab770c7df5718
> ___

Hi All,

Me and dbaupp have done some preliminary implementation[1] on the formatting 
side of things. During
discussion on IRC we have come up with a few extra details that should probably 
be mentioned.

Using a writer for format strings is useful for efficiency, especially when 
doing things like 
writing to the terminal or a file. So there are 3 syntax extensions that would 
be used in order to 
make this work and be nice:

* fmt! which is essentially the same as now, returns a ~str
* printf! which writes straight to stdout (effectively replacing 
`io::print(fmt!(...))`)
* writef! which would take an io::Writer as it's first argument

fmt! and printf! would simply be written in terms of writef! with pre-supplied 
Writers.

The actual format string has, unsurprisingly, created a lot of discussion 
mostly around it's 
relative power. The current placeholder format is as follows:

% position flags width precision numeric_arg conversion_specifier

With all except the '%' and conversion specifier being optional. The specific 
format of the fields 
is detailed in the string parser.

Currently we have identified 4 conversion specifiers: 'd', 'f', 's' and '?'. 
These are interpreted 
as "convert as" specifiers so '%d' means "convert this argument as a number" 
and the argument type 
itself knows how to do this.

For flags, we have '0', '-', '=', ' ', '+' and '\'' which have the same meaning 
as standard printf 
(where they exist in standard printf).

* '0' means zero-pad
* '-' means left-justified in the field
* '=' means center in the field
* ' ' means that a blank should always be before a signed number
* '+' means that a '-' or '+' should always be placed before a signed number

Width and precision fields are similar to the standard printf fields, just with 
minor syntax 
changes in the case of using the next or a specific argument.

The numeric arg field is formatted like this: `<[0-9]+>` and is used for 
supplying a base to 'd' 
conversions, with the default being 10. This means that '%x', '%o' and '%t' can 
all be replaced 
with this format: '%<16>d', '%<8>d' and '%<2>d'. You could obviously specify 
other bases to print 
in up to 36.

I'm in favor of keeping printf-style strings. For one, they are what we already 
have, so in that 
sense it's merely not changing that. Also, I am struggling to see the objective 
advantages of 

Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-04 Thread Mikhail Zabaluev
Hi,

2013/5/3 Graydon Hoare 

>
> (Erm, it might also be worthwhile to consider message catalogues and
> locale-facets at this point; the two are closely related. We do not have a
> library page on that topic yet, but ought to. Or include it in the lib-fmt
> page.)


If you are talking about gettext-like functionality, usually this and
format strings are thought of as independent processing layers: format
strings are translated as such and then fed to the formatting function.
This brings some ramifications, as the order of parameters in the
translated template can change, so the format syntax has to support
positional parameters. But this also allows to account for data-derived
context such as numeral cases, without complicating the printf-like
functions too much.
There are other difficulties with localizing formatted messages that are
never systematically solved, for example, accounting for gender. In all, it
looks like an interesting area for library research, beyond the basic
"stick this value pretty-printed into a string" problem.

Cheers,
  Mikhail
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-03 Thread Graydon Hoare

On 13-05-03 01:21 PM, Graydon Hoare wrote:

On 13-05-03 01:12 PM, Brian Anderson wrote:


I agree with reconsidering the inconsistent, underspecified printf
syntax, but don't have any specific thoughts on this at this time.


Note that I made a page collecting links to existing format libraries a
little while back:


(Erm, it might also be worthwhile to consider message catalogues and 
locale-facets at this point; the two are closely related. We do not have 
a library page on that topic yet, but ought to. Or include it in the 
lib-fmt page.)


-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-03 Thread Graydon Hoare

On 13-05-03 01:12 PM, Brian Anderson wrote:


I agree with reconsidering the inconsistent, underspecified printf
syntax, but don't have any specific thoughts on this at this time.


Note that I made a page collecting links to existing format libraries a 
little while back:


https://github.com/mozilla/rust/wiki/Lib-fmt

I'm similarly excited to see someone taking charge of this. Thanks!

-Graydon

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC: User-implementable format specifiers w/ compile-time checks

2013-05-03 Thread Brian Anderson

On 05/03/2013 08:28 AM, Huon Wilson wrote:

Hi all,

Aatch, Kimundi and I (and maybe some others... sorry if I've forgotten
you) came up with a bit of proposal on IRC for handling fmt!. It's
possibly been considered already, but whatever, we'd like some
comments on it.


I'm glad you are thinking about this. fmt! is in desperate need of an 
overhaul, both in design and implementation.





There would one trait for each format specifier (probably excluding
`?'), e.g. FormatC for %c, FormatD for %d/%i, FormatF for %f, and
format would just require that the value for each format specifier
implements the correct trait. (Presumably this check can be done
"automatically" by attempting to call the appropriate method and
using the type checker.)

In code,


trait FormatC {
  fn format_c(&self, w: &Writer, flags: Flags);
}

impl FormatC for char {
  fn format_c(&self, w: &Writer, _: Flags) { w.write_char(*self) }
}

struct MyChar(char);
impl FormatC for MyChar {
  fn format_c(&self, w: &Writer, _: Flags) { w.write_char(**self) }
}


Good call using Writer here. This is one of the crucial changes that 
must be made.




fmt!("%c%c%c", 'a', MyChar('a'), ~"str")

// becomes

'a'.format_c(w, {});
MyChar('a').format_c(w, {});
~"str".format_c(w, {});


And the first two would resolve/type-check fine, but the last would
not. (`Flags' would contain the width and precision specifiers and all
that.)


For these pre-existing format specifiers this would allow arbitrary 
types to be formatted as i.e. characters. This may be overkill. What we 
*definitely* need though is for all types that are e.g. signed integers 
to implement `%i`.


`FormatC` I would probably prefer to be `FormatChar`, etc. for clarity.




This could then be extended to have a dynamic formatter, which allows
types to format for any specifier at runtime (i.e. get around compile
time restrictions). Our thoughts were to add an extra flag to indicate
this (e.g. !), so that it is entirely and explicitly opt-in. (Similar
to Python's __format__ and Go's fmt (I think).)


trait DynamicFormat {
  fn format_dynamic(&self, w: &Writer, spec: FormatSpec);
}

fmt!("%!s %!10.3f", a, b)

// becomes

a.format_dynamic(w, {flags: {}, type: 's'});
w.write_str(" ");
b.format_dynamic(w, {flags: {width: 10, prec: 3}, type: 'f'});


(Presumably this could also have a lint mode, to give an error or
warning if dynamic formatting is used.)


I don't understand the use case for this. I would understand (and want) 
a generic format specifier that defers to some trait, but without 
specifically using type 's' or 'f'. Something like "%!" that becomes 
`a.format(w, flags)`, but doesn't try to coerce arbitrary types to print 
as other arbitrary types.





There were also some other discussions about the fmt! syntax, e.g. it
was suggested that the following could be equivalent to each other

fmt!("%{2}[0].[1]f %{2}e", 10, 3, 1.01);
fmt!("%10.3f %e", 1.01, 1.01);

This is an explicit divergence from printf's slightly archane */'n$'
placeholder syntax. One could use `[*]`  to just refer to the next
argument, like * does by default. (Aatch has a format spec parser[1]
in the works that supports this syntax.)


I agree with reconsidering the inconsistent, underspecified printf 
syntax, but don't have any specific thoughts on this at this time.


Nice work. I look forward to seeing where this goes.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev