php-i18n Digest 4 Dec 2004 20:45:08 -0000 Issue 265

php-i18n-digest-help Sat, 04 Dec 2004 12:45:10 -0800

php-i18n Digest 4 Dec 2004 20:45:08 -0000 Issue 265

Topics (messages 819 through 822):


Re: GETTEXT strings occasionally don't get translated
        819 by: Yannick Warnier

Re: Using Translation from PEAR, other libraries
        820 by: Ligaya Turmelle

mb_ereg_replace bug?
        821 by: Ezra Gilbert

Re: [PHP-DEV] mbstring internal encoding behavior
        822 by: Moriyoshi Koizumi

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------

--- Begin Message ---

Le lun 29/11/2004 � 09:42, Xavier O a �crit :
> Hi,
> 
> We got the same problem. Sometimes, the translation is displayed, 
> sometimes, the Original is displayed. Has anybody found a solution ?

You should probably e-mail Patrick directly. He was the one involved :-)

Yannick

--- End Message ---

--- Begin Message --- Just my 2 cents - I'm making my site using John's IntSmarty. For me it was the 'easiest' way to go - and has been relatively painless thus far. Until now I have only had exposure to Smarty through tutorials - so I really didn't have any opinion about it. I looked into the gettext option and just got really confused. Until now I have never heard of the PEAR I18n packages.

My site is pretty small and will have low traffic (in an Administrative area of an Intranet) but must be in a minimum of English, Japanese, and Korean.
Respectfully,
Ligaya Turmelle
Jochem Maas wrote:
Jacob Singh wrote:
What is the common framework people use for I18N on your sites? John Coggenshall has an article in PHPBuilder about using smarty filters. I don't really approve of this approach because it is forcing me into Smarty, which I am not particularly fond of.

I like the look of PEAR::translation2, but I am not sure about the best way to implement it. I feel that a good I18N package, like any other package, doesn't compromise your framework intentions. This one seems to require that you use PEAR:DB through their connection, which is a problem because of connection pooling, and the fact that I don't use PEAR:DB, I am using propel.

Any thoughts on this? I need to make a site that is UTF-8 and has translations not only for labels and images, but in many cases for actual data.
a few thoughts:
1. I believe translation is integral to any web framework because a framework is about managing contextual content display (and the language is a variable attribute of the content). Also I wouldn't expect alot of code out there that doesn't come with some baggage (from the point fo view of your own framework), then again there is nothing to stop you from stripping down a PEAR module to suit your needs.

2. I view static text (e.g. button labels) and user text as fundamentally different - for the static texts I use a class that handles translating placeholder strings and for user created text I have an integrated translation service in my data objects - one tells the DB class to attempt (if a translation for the current language is not found then the original value is shown) to translate relevant values (i.e. fields marked in the data objects as 'translatable') when 'getting' values, the translations are stored in a seperate table ala:
KEY    - a user created string taken from an arbitrary row & table
      in the DB.
LANG    - a language code relating to the language of the value of the
          TEXT field
TEXT    - the translated value of KEY
I'm thinking of storing my data in an XML format in MySQL with multiple translations and making my own search index for each language. The problem with this is that I have to grab the entire XML doc for each field which may have 10-15 translations, parse and then display, wasting lots of processing and database time.

I'm not farmilliar with XML databases, and I'm told they are bad voodo, but what is another solution if you have to store user entered records in 'n' languages?
the table I describe above actaully covers that scenario - how you present the management interface is ofcourse up to you. for a given KEY (text te translate) and LANG (id of the desired language) it is possible to retrieve a translation - the table stipulate the 3 bits of information required for every/any specific translation that needs to occur. you could alternatively implement it as a set of arrays (one for each lang). e.g.
$Lang['KEY'] = 'TEXT';
(I do something like this for what I call 'static' texts).
Bare in mind that you could use foreign key relationships to create a M-to-N joining table(s) that stores translations for given entities in the DB e.g.
WEBPAGES
id
title
url
WEBPAGE_CONTENTS
webpage_id    --> WEBPAGES.id
lang_id        --> LANGS.id
content
LANGS
id
name
(another trick I use when it is not feasable to use a default value as a key - i.e. a whole page of text makes rather a large key value - rather larger than most DBs expect for indexable key fields)

You mention John C.'s article about smarty filters - you might then want to look at Apache2 output filters, very cool stuff by all accounts, although I have no personal experience with them
---
I18N / L10N can be a bitch, I mean not only do you have to implement it but then you have users who want to quickly/easily manage 100's/10,000's of translatable text. on top of which you will find yourself in the murky waters of encoding translation and/or Unicode (UTF8/16) - the reason I say this is that these things can be complex enough with out making life even harder by starting off determined to use XML as part of the solution. besides unless you are going to use some serious caching of output (e.g. smarty caching, homebrewed output caching, squid etc etc) then extracting large chunks of XML from a DB and then having to parse it before extracting the relevant values (probably repeated more than once per request) is probably going to make your site alot slower. I'll say that another way - deciding to use XML should be the endpoint of your investigation not the starting point.
Hope thats given you some stuff to think about and maybe spark some ideas!
grds,
Jochem
Thanks
Jacob
--- End Message ---

--- Begin Message ---

Has anyone noticed an issue with mb_ereg_replace when the pattern string
contains a # character?

The following problem is seen with php-4.2.2 with mbstring/mbregex enabled.

In (1) below, ereg_replace has no problem matching a pattern containing a #.
In (2), mb_ereg_replace ignores #52 from the pattern and replaces @ with
test
To fix the problem, we need to escape # to be \# as in (3).  I didn't think
# has special significance in POSIX regex and it worked ok in (1) with
ereg_replace.

1)
$s = 'blah @#52 blah';
print("s: $s \n ");
$s = ereg_replace('@#52','test',$s);
print("s: $s \n ");

s: blah @#52 blah
s: blah test blah

----------------------
2)
$s = 'blah @#52 blah';
print("s mb: $s \n ");
$s = mb_ereg_replace('@#52','test',$s);
print("s mb: $s \n ");

s mb: blah @#52 blah
s mb: blah test#52 blah

----------------------
3)
$s = 'blah @#52 blah';
print("s mb\: $s \n");
$s = mb_ereg_replace('@\#52','test',$s);
print("s mb\: $s \n");

s mb\: blah @#52 blah
s mb\: blah test blah
----------------------

The problem comes up when trying to create the following function:
    function html_special_decode($s) {
      $s = mb_ereg_replace('&gt;', '>', $s);
      $s = mb_ereg_replace('&lt;', '<', $s);
      $s = mb_ereg_replace('&quot;', '"', $s);
      $s = mb_ereg_replace('&#39;', '\'', $s);
      $s = mb_ereg_replace('&amp;', '&', $s);
      return $s;
    }


-Ezra

"Renato De Giovanni" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> > It's probable that it's a PHP...erm..."fact of life" right now. I ran
> > into similar problems with iso-8859-7 and -9, using both
> > htmlspecialchars and htmlentities with the (optional) 3rd parameter.
> > Things worked unpredictably. In the PHP build I have now (4.4ish, from
> > recent CVS), htmlspecialchars actually prints out a PHP error message
> > (E_WARNING, I believe) that:
> >
> > "ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"
> >
> > So I wouldn't be surprised if you weren't running into this problem,
> > which wasn't officially recognized until after 4.2 was released. Look
> > at bugs.php.net for related bugs...it's the only good way to keep up on
> > the issue, which seems to be evolving...
> >
> > Cheers,
> > spud.
>
> Ok, so it's a known "missing feature".
>
> Meanwhile, it's possible to replace:
>
> $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
>
> with:
>
> mb_regex_encoding('UTF-8');
> $s = mb_ereg_replace('&', '&amp;', $s);
> $s = mb_ereg_replace('>', '&gt;', $s);
> $s = mb_ereg_replace('<', '&lt;', $s);
> $s = mb_ereg_replace('"', '&quot;', $s);
>
> ...which should decrease performance considerably, but I see no other
> workaround.
>
> Thanks,
> --
> Renato
>
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
>

--- End Message ---

--- Begin Message ---
Hello,
Redirecting to php-i18n list, which is the most suitable place
for this kind of matter.
On 2004/12/05, at 5:01, Al Baker wrote:
I've noticed some different behavior between mbstring versions 4.2.2 and 4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects the script.

In 4.2.2, the encoding translation appeared to work okay and would convert Shift_JIS into UTF-8 on incoming requests. We didn't try any other encodings since this was our primary concern and worked well. The internal_encoding setting in the php.ini file was set to UTF-8. Our language file (very simple PHP array with values being the translated text) was in Shift_JIS, and this was no problem to just send this to the browser. We display send the Shift_JIS language file entries to the browser [via Smarty] as well as some other text that is stored in UTF-8 and run through mb_convert_encoding to convert it to Shift_JIS as well. All in all, this works as expected.
So you set the internal encoding to UTF-8 while having translated
message catalog as Shift_JIS. Then which encoding are you using
primarily in your script? And is there any reason not to use UTF-8
everywhere applicable?
Now, we're trying to upgrade to php4.3.9 and I can find no easy way to get the Shift_JIS to work.... in the existing setup, it would just return UTF-8 or garbled characters. In other words, mb_convert_encoding was not doing it's job, and it wouldn't even display the Shift_JIS language file entries. Manually converting the language file from Shift_JIS characters to UTF-8 and then running all the elements through mb_convert_encoding apparently did nothing as well -- unless I first called mb_internal_encoding() and set that to Shift_JIS (likewise, setting this in the php.ini file worked as well). Then, the characters would be displayed correctly in Shift_JIS. I'm not sure if this is the correct behavior though... it seems to me that the internal encoding should almost always be UTF-8 and mb_convert_encoding should work regardless of the internal encoding.
What settings did you put in your php.ini about mbstring? Does it contain any part like "mbstring.language" comes after "mbstring.internal_encoding"?

- http_input was set to UTF-8, SJIS, as was the detect_order.
Which encoding is used for the input data?
Regards,
Moriyoshi
--- End Message ---

php-i18n Digest 4 Dec 2004 20:45:08 -0000 Issue 265

Reply via email to