php-i18n Digest 17 Jan 2007 05:13:05 -0000 Issue 347

php-i18n-digest-help Tue, 16 Jan 2007 21:13:26 -0800

php-i18n Digest 17 Jan 2007 05:13:05 -0000 Issue 347

Topics (messages 1048 through 1048):


Re: ICU ResourceBundles for ext/unicode
        1048 by: Andi Gutmans

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------

--- Begin Message ---

Hi Norbert,

We are implementing something similar in the Zend Framework
(http://framework.zend.com/wiki/display/ZFPROP/Zend_Translate+-+Thomas+Weidner).

Is this in the direction of what you mean?

Andi 

> -----Original Message-----
> From: Norbert Lindenberg [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, December 21, 2006 9:18 AM
> To: [email protected]
> Cc: Norbert Lindenberg
> Subject: [PHP-I18N] Re: ICU ResourceBundles for ext/unicode
> 
> Hi all,
> 
> I'm new to this list, so let me introduce myself: I'm one of 
> the internationalization architects at Yahoo, and have 
> recently started looking into PHP internationalization. I 
> previously worked on Java internationalization at Sun for a few years.
> 
> Resource bundles have helped make internationalization of 
> Java applications easy and popular, so I'd like to see a 
> similar capability in PHP. I know gettext is available, but 
> it seems a bit difficult to understand and uses locale 
> specific encodings instead of Unicode.
> 
> The ICU style of resource bundles is Unicode based, but also 
> seems more complicated than desirable for PHP. It's designed 
> for a statically typed environment and requires compilation, 
> neither of which fit in well with PHP.
> 
> I'd rather start with Java properties files, the simplest and 
> most widely used form of Java resource files. I'd adapt them 
> to PHP 6 by switching their encoding to UTF-8, adopting 
> heredocs, and simplifying their syntax. I'd drop the 
> secondary fallback mechanism, in which resource bundles can 
> inherit individual resources from other bundles.  
> This is an optimization to reduce the size of bundles at the 
> expense of runtime overhead and additional work in creating 
> the bundles. The additional step of finding common resources 
> and moving them to shared bundles is rarely made in normal 
> localization processes, and the space savings don't matter 
> much for PHP, where bundles remain on the server. Dropping 
> the secondary fallback also means that we can simplify the 
> resource bundle to just an array.
> 
> Below is my draft specification for a resource bundle 
> mechanism for PHP. For comparison, the specs for the 
> corresponding functionality in Java and ICU4C are:
> - ICU resource bundle specification:
>      http://icu.sourceforge.net/apiref/icu4c/ures_8h.html
> - ICU resource file specification:
>      http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icuhtml/design/
> bnf_rb.txt?view=co
> - Java resource bundle specification:
>      
> http://java.sun.com/javase/6/docs/api/java/util/ResourceBundle.html
> - Java properties file specification:
>      http://java.sun.com/javase/6/docs/api/java/util/
> Properties.html#load(java.io.Reader)
> 
> 
> API:
> 
> array intl_get_resources(string base_name)
> 
> Returns an array containing the key-value pairs obtained from 
> a resource file. The resource file is looked up for the 
> current locale using the lookup algorithm of section 3.4 of 
> RFC 4647, at each step generating the file name by 
> concatenating the given base name, the string "-", the 
> language tag of the current step, and the string ".pres". The 
> default file name used if no previous step was successful is 
> the concatenation of base name and ".pres". Once a file is 
> found, its contents are interpreted according to the resource 
> file format specified below, and an array is filled with its 
> key-value pairs. An entry with "#locale#" as its key and the 
> actual locale tag of the file found as its value is added to 
> the array, and the array is returned. The function may cache 
> its results, but must check at least once every 60 minutes 
> that the underlying resource files haven't changed.
> 
> 
> Resource File Format
> 
> - Files are encoded in UTF-8. The first line may be prefixed 
> with a BOM.
> - Lines whose first non-whitespace character is "#" are 
> comment lines and are ignored.
> - Lines that contain only whitespace characters and are not 
> part of a heredoc string are ignored.
> - Key-value definitions come in two forms:
>            o The simple form has a key string, followed by 
> "=", followed by the value, all on one line. The tokens may 
> or may not be surrounded by whitespace characters. Leading 
> and trailing whitespace is trimmed from both key and value. 
> The value cannot start with "<<<"; for values starting with 
> this character sequence, use the heredoc form.
>            o The heredoc form starts with a key string, 
> followed by "=", followed by "<<<", followed by an 
> identifier, all on one line.  
> The tokens may or may not be surrounded by whitespace characters.  
> Leading and trailing whitespace is trimmed from both key and value.  
> The heredoc form ends with a termination line that contains 
> only the identifier, possibly followed by a semicolon. The 
> lines between these two lines, except comment lines, form the 
> heredoc string. The line break before the termination line is 
> removed, all other line breaks are preserved.
> - Lines that are not comment lines, whitespace lines, or part 
> of a key-value definition are illegal.
> - The following escape sequences are recognized in values:
>            o "\\" stands for "\"
>            o "\n" stands for the newline character, U+000A.
>            o "\t" stands for the horizontal tab character, U+0009.
>            o "\ " stands for the space character, U+0020. 
> This is only needed if the value of a key-value pair starts 
> or ends with a space character.
>            o "\#" stands for the number sign character, 
> U+0023. This is only needed if a line within a heredoc string 
> starts with this character.
> - A sequence of "\" followed by a character not listed above 
> is illegal. A "\" immediately preceding the end of the file 
> is illegal.
> - Only the characters horizontal tab, U+0009, and space, 
> U+0020, are considered whitespace.
> 
> 
> With that, hello world becomes:
> 
> <?php
>      $strings = intl_get_resources("strings");
>      echo "$strings[hello]";
> ?>
> 
> The strings.pres file contains:
>      hello = Hello, world!
> and strings-ja.pres contains:
>      hello = こんにちは、皆さん。
> 
> What do you think?
> 
> Regards,
> Norbert
> 
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/) To 
> unsubscribe, visit: http://www.php.net/unsub.php
>

--- End Message ---

php-i18n Digest 17 Jan 2007 05:13:05 -0000 Issue 347

Reply via email to