On Wed, Mar 06, 2002 at 09:45:18AM -0800, Rich Morin wrote:
> Going a bit further afield, I also started thinking about the general
> nature of printf. I've been using this basic syntax since 1970 (in
> the form of Fortran's FORMAT statements :-), so I'm pretty comfortable
> with it. OTOH, I don't like the fact that the format specifications
> can become widely separated from the variables they reference.
I agree that this is a horribly "illiterate" aspect of printf....
> With all of Larry's talk about making "x" mode the standard in REs and
> having more "pair-based" syntax here and there, I started thinking
> about a replacement for printf, as:
>
> printx(
> 'The value of $foo is %f7.3; ', $foo,
> 'the value of $bar is %f7.3.%n', $bar
> );
.... But why propose such an off-the-wall solution? Wouldn't it make
more sense to make it more like interpolation? Eg,
printx 'The value of $foo is %f7.3{foo}', { foo => $foo };
^^^ key of the following hash
Tangent: One crucial but often overlooked aspect of designing
"format string" schemes is that they can, with some care, facilitate
internationalization. C format strings are actually pretty good for
this, due to the following characteristics:
- The translator doesn't have to touch code, only format
strings. This is obviously desirable.
- C format strings are fairly "safe", in that a format string
isn't likely to break a program. This is far from strictly
true, however, due to things like %n and the ability to access
more arguments than are passed, which is undefined in C. This
might or might not be exploitable if your translator is
malicious!
- C format strings give the translator reasonable flexibility:
Eg, they can reorder, repeat, or omit placeholders with the
%m$ syntax.
- Some localization is be "automagic", eg number formatting
punctuation.
However, if you don't keep internationalization in mind, it is easy
to lose these characteristics. For example, your proposal seems to
encourage
printx
'Your little dog %s ', $dog,
'attacks the evil %s.', $monster;
In this example, the translator is unable to change the sentence
structure (unless he can change the code).
So I would humbly advise anyone thinking about this to
- think safe. Don't add features like the ability to execute
arbitrary Perl expressions! (Or at least, offer a version
without unsafe features, and recommend that programmers use it
in most cases.)
- think flexibility for the translator. This can be hard if you
don't have linguisting or localization experience, but you can
use your imagination. Desirable features might include
locale-sensitive formatting of dates, currencies, etc;
handling of plurals (gettext has a neat solution, though it
requires multiple format strings); an "internationalized
string" type wrapping up format string plus placeholders[1].
The Java and C# string formatting libraries are worth looking
at (don't take this as high praise though).
- make sure that a translator doesn't have to change anything
except the format string.
Andrew
[1] This is my pet idea. Tell me if you see it somewhere!