php-i18n Digest 17 Jan 2007 05:13:05 -0000 Issue 347
Topics (messages 1048 through 1048):
Re: ICU ResourceBundles for ext/unicode
1048 by: Andi Gutmans
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
Hi Norbert,
We are implementing something similar in the Zend Framework
(http://framework.zend.com/wiki/display/ZFPROP/Zend_Translate+-+Thomas+Weidner).
Is this in the direction of what you mean?
Andi
> -----Original Message-----
> From: Norbert Lindenberg [mailto:[EMAIL PROTECTED]
> Sent: Thursday, December 21, 2006 9:18 AM
> To: [email protected]
> Cc: Norbert Lindenberg
> Subject: [PHP-I18N] Re: ICU ResourceBundles for ext/unicode
>
> Hi all,
>
> I'm new to this list, so let me introduce myself: I'm one of
> the internationalization architects at Yahoo, and have
> recently started looking into PHP internationalization. I
> previously worked on Java internationalization at Sun for a few years.
>
> Resource bundles have helped make internationalization of
> Java applications easy and popular, so I'd like to see a
> similar capability in PHP. I know gettext is available, but
> it seems a bit difficult to understand and uses locale
> specific encodings instead of Unicode.
>
> The ICU style of resource bundles is Unicode based, but also
> seems more complicated than desirable for PHP. It's designed
> for a statically typed environment and requires compilation,
> neither of which fit in well with PHP.
>
> I'd rather start with Java properties files, the simplest and
> most widely used form of Java resource files. I'd adapt them
> to PHP 6 by switching their encoding to UTF-8, adopting
> heredocs, and simplifying their syntax. I'd drop the
> secondary fallback mechanism, in which resource bundles can
> inherit individual resources from other bundles.
> This is an optimization to reduce the size of bundles at the
> expense of runtime overhead and additional work in creating
> the bundles. The additional step of finding common resources
> and moving them to shared bundles is rarely made in normal
> localization processes, and the space savings don't matter
> much for PHP, where bundles remain on the server. Dropping
> the secondary fallback also means that we can simplify the
> resource bundle to just an array.
>
> Below is my draft specification for a resource bundle
> mechanism for PHP. For comparison, the specs for the
> corresponding functionality in Java and ICU4C are:
> - ICU resource bundle specification:
> http://icu.sourceforge.net/apiref/icu4c/ures_8h.html
> - ICU resource file specification:
> http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icuhtml/design/
> bnf_rb.txt?view=co
> - Java resource bundle specification:
>
> http://java.sun.com/javase/6/docs/api/java/util/ResourceBundle.html
> - Java properties file specification:
> http://java.sun.com/javase/6/docs/api/java/util/
> Properties.html#load(java.io.Reader)
>
>
> API:
>
> array intl_get_resources(string base_name)
>
> Returns an array containing the key-value pairs obtained from
> a resource file. The resource file is looked up for the
> current locale using the lookup algorithm of section 3.4 of
> RFC 4647, at each step generating the file name by
> concatenating the given base name, the string "-", the
> language tag of the current step, and the string ".pres". The
> default file name used if no previous step was successful is
> the concatenation of base name and ".pres". Once a file is
> found, its contents are interpreted according to the resource
> file format specified below, and an array is filled with its
> key-value pairs. An entry with "#locale#" as its key and the
> actual locale tag of the file found as its value is added to
> the array, and the array is returned. The function may cache
> its results, but must check at least once every 60 minutes
> that the underlying resource files haven't changed.
>
>
> Resource File Format
>
> - Files are encoded in UTF-8. The first line may be prefixed
> with a BOM.
> - Lines whose first non-whitespace character is "#" are
> comment lines and are ignored.
> - Lines that contain only whitespace characters and are not
> part of a heredoc string are ignored.
> - Key-value definitions come in two forms:
> o The simple form has a key string, followed by
> "=", followed by the value, all on one line. The tokens may
> or may not be surrounded by whitespace characters. Leading
> and trailing whitespace is trimmed from both key and value.
> The value cannot start with "<<<"; for values starting with
> this character sequence, use the heredoc form.
> o The heredoc form starts with a key string,
> followed by "=", followed by "<<<", followed by an
> identifier, all on one line.
> The tokens may or may not be surrounded by whitespace characters.
> Leading and trailing whitespace is trimmed from both key and value.
> The heredoc form ends with a termination line that contains
> only the identifier, possibly followed by a semicolon. The
> lines between these two lines, except comment lines, form the
> heredoc string. The line break before the termination line is
> removed, all other line breaks are preserved.
> - Lines that are not comment lines, whitespace lines, or part
> of a key-value definition are illegal.
> - The following escape sequences are recognized in values:
> o "\\" stands for "\"
> o "\n" stands for the newline character, U+000A.
> o "\t" stands for the horizontal tab character, U+0009.
> o "\ " stands for the space character, U+0020.
> This is only needed if the value of a key-value pair starts
> or ends with a space character.
> o "\#" stands for the number sign character,
> U+0023. This is only needed if a line within a heredoc string
> starts with this character.
> - A sequence of "\" followed by a character not listed above
> is illegal. A "\" immediately preceding the end of the file
> is illegal.
> - Only the characters horizontal tab, U+0009, and space,
> U+0020, are considered whitespace.
>
>
> With that, hello world becomes:
>
> <?php
> $strings = intl_get_resources("strings");
> echo "$strings[hello]";
> ?>
>
> The strings.pres file contains:
> hello = Hello, world!
> and strings-ja.pres contains:
> hello = こんにちは、皆さん。
>
> What do you think?
>
> Regards,
> Norbert
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/) To
> unsubscribe, visit: http://www.php.net/unsub.php
>
--- End Message ---