php-i18n Digest 4 Dec 2004 20:45:08 -0000 Issue 265
Topics (messages 819 through 822):
Re: GETTEXT strings occasionally don't get translated
819 by: Yannick Warnier
Re: Using Translation from PEAR, other libraries
820 by: Ligaya Turmelle
mb_ereg_replace bug?
821 by: Ezra Gilbert
Re: [PHP-DEV] mbstring internal encoding behavior
822 by: Moriyoshi Koizumi
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Le lun 29/11/2004 � 09:42, Xavier O a �crit :
> Hi,
>
> We got the same problem. Sometimes, the translation is displayed,
> sometimes, the Original is displayed. Has anybody found a solution ?
You should probably e-mail Patrick directly. He was the one involved :-)
Yannick
--- End Message ---
--- Begin Message ---
Just my 2 cents - I'm making my site using John's IntSmarty. For me it
was the 'easiest' way to go - and has been relatively painless thus far.
Until now I have only had exposure to Smarty through tutorials - so I
really didn't have any opinion about it. I looked into the gettext
option and just got really confused. Until now I have never heard of
the PEAR I18n packages.
My site is pretty small and will have low traffic (in an Administrative
area of an Intranet) but must be in a minimum of English, Japanese, and
Korean.
Respectfully,
Ligaya Turmelle
Jochem Maas wrote:
Jacob Singh wrote:
What is the common framework people use for I18N on your sites? John
Coggenshall has an article in PHPBuilder about using smarty filters. I
don't really approve of this approach because it is forcing me into
Smarty, which I am not particularly fond of.
I like the look of PEAR::translation2, but I am not sure about the
best way to implement it. I feel that a good I18N package, like any
other package, doesn't compromise your framework intentions. This one
seems to require that you use PEAR:DB through their connection, which
is a problem because of connection pooling, and the fact that I don't
use PEAR:DB, I am using propel.
Any thoughts on this? I need to make a site that is UTF-8 and has
translations not only for labels and images, but in many cases for
actual data.
a few thoughts:
1. I believe translation is integral to any web framework because a
framework is about managing contextual content display (and the language
is a variable attribute of the content). Also I wouldn't expect alot of
code out there that doesn't come with some baggage (from the point fo
view of your own framework), then again there is nothing to stop you
from stripping down a PEAR module to suit your needs.
2. I view static text (e.g. button labels) and user text as
fundamentally different - for the static texts I use a class that
handles translating placeholder strings and for user created text I have
an integrated translation service in my data objects - one tells the DB
class to attempt (if a translation for the current language is not found
then the original value is shown) to translate relevant values (i.e.
fields marked in the data objects as 'translatable') when 'getting'
values, the translations
are stored in a seperate table ala:
KEY - a user created string taken from an arbitrary row & table
in the DB.
LANG - a language code relating to the language of the value of the
TEXT field
TEXT - the translated value of KEY
I'm thinking of storing my data in an XML format in MySQL with
multiple translations and making my own search index for each
language. The problem with this is that I have to grab the entire XML
doc for each field which may have 10-15 translations, parse and then
display, wasting lots of processing and database time.
I'm not farmilliar with XML databases, and I'm told they are bad
voodo, but what is another solution if you have to store user entered
records in 'n' languages?
the table I describe above actaully covers that scenario - how you
present the management interface is ofcourse up to you. for a given KEY
(text te translate) and LANG (id of the desired language) it is possible
to retrieve a translation - the table stipulate the 3 bits of
information required for every/any specific translation that needs to
occur. you could alternatively implement it as a set of arrays (one for
each lang). e.g.
$Lang['KEY'] = 'TEXT';
(I do something like this for what I call 'static' texts).
Bare in mind that you could use foreign key relationships to create
a M-to-N joining table(s) that stores translations for given entities in
the DB e.g.
WEBPAGES
id
title
url
WEBPAGE_CONTENTS
webpage_id --> WEBPAGES.id
lang_id --> LANGS.id
content
LANGS
id
name
(another trick I use when it is not feasable to use a default value as a
key - i.e. a whole page of text makes rather a large key value - rather
larger than most DBs expect for indexable key fields)
You mention John C.'s article about smarty filters - you might then
want to look at Apache2 output filters, very cool stuff by all accounts,
although I have no personal experience with them
---
I18N / L10N can be a bitch, I mean not only do you have to implement it
but then you have users who want to quickly/easily manage 100's/10,000's
of translatable text. on top of which you will find yourself in the
murky waters of encoding translation and/or Unicode (UTF8/16) - the
reason I say this is that these things can be complex enough with out
making life even harder by starting off determined to use XML as part of
the solution. besides unless you are going to use some serious caching
of output (e.g. smarty caching, homebrewed output caching, squid etc
etc) then extracting large chunks of XML from a DB and then having to
parse it before extracting the relevant values (probably repeated more
than once per request) is probably going to make your site alot slower.
I'll say that another way - deciding to use XML should be the endpoint
of your investigation not the starting point.
Hope thats given you some stuff to think about and maybe spark some ideas!
grds,
Jochem
Thanks
Jacob
--- End Message ---
--- Begin Message ---
Has anyone noticed an issue with mb_ereg_replace when the pattern string
contains a # character?
The following problem is seen with php-4.2.2 with mbstring/mbregex enabled.
In (1) below, ereg_replace has no problem matching a pattern containing a #.
In (2), mb_ereg_replace ignores #52 from the pattern and replaces @ with
test
To fix the problem, we need to escape # to be \# as in (3). I didn't think
# has special significance in POSIX regex and it worked ok in (1) with
ereg_replace.
1)
$s = 'blah @#52 blah';
print("s: $s \n ");
$s = ereg_replace('@#52','test',$s);
print("s: $s \n ");
s: blah @#52 blah
s: blah test blah
----------------------
2)
$s = 'blah @#52 blah';
print("s mb: $s \n ");
$s = mb_ereg_replace('@#52','test',$s);
print("s mb: $s \n ");
s mb: blah @#52 blah
s mb: blah test#52 blah
----------------------
3)
$s = 'blah @#52 blah';
print("s mb\: $s \n");
$s = mb_ereg_replace('@\#52','test',$s);
print("s mb\: $s \n");
s mb\: blah @#52 blah
s mb\: blah test blah
----------------------
The problem comes up when trying to create the following function:
function html_special_decode($s) {
$s = mb_ereg_replace('>', '>', $s);
$s = mb_ereg_replace('<', '<', $s);
$s = mb_ereg_replace('"', '"', $s);
$s = mb_ereg_replace(''', '\'', $s);
$s = mb_ereg_replace('&', '&', $s);
return $s;
}
-Ezra
"Renato De Giovanni" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> > It's probable that it's a PHP...erm..."fact of life" right now. I ran
> > into similar problems with iso-8859-7 and -9, using both
> > htmlspecialchars and htmlentities with the (optional) 3rd parameter.
> > Things worked unpredictably. In the PHP build I have now (4.4ish, from
> > recent CVS), htmlspecialchars actually prints out a PHP error message
> > (E_WARNING, I believe) that:
> >
> > "ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"
> >
> > So I wouldn't be surprised if you weren't running into this problem,
> > which wasn't officially recognized until after 4.2 was released. Look
> > at bugs.php.net for related bugs...it's the only good way to keep up on
> > the issue, which seems to be evolving...
> >
> > Cheers,
> > spud.
>
> Ok, so it's a known "missing feature".
>
> Meanwhile, it's possible to replace:
>
> $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
>
> with:
>
> mb_regex_encoding('UTF-8');
> $s = mb_ereg_replace('&', '&', $s);
> $s = mb_ereg_replace('>', '>', $s);
> $s = mb_ereg_replace('<', '<', $s);
> $s = mb_ereg_replace('"', '"', $s);
>
> ...which should decrease performance considerably, but I see no other
> workaround.
>
> Thanks,
> --
> Renato
>
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
>
--- End Message ---
--- Begin Message ---
Hello,
Redirecting to php-i18n list, which is the most suitable place
for this kind of matter.
On 2004/12/05, at 5:01, Al Baker wrote:
I've noticed some different behavior between mbstring versions 4.2.2
and
4.3.9 -- both on RedHat 8 -- in terms of how internal encoding affects
the script.
In 4.2.2, the encoding translation appeared to work okay and would
convert Shift_JIS into UTF-8 on incoming requests. We didn't try any
other encodings since this was our primary concern and worked well.
The
internal_encoding setting in the php.ini file was set to UTF-8. Our
language file (very simple PHP array with values being the translated
text) was in Shift_JIS, and this was no problem to just send this to
the
browser. We display send the Shift_JIS language file entries to the
browser [via Smarty] as well as some other text that is stored in UTF-8
and run through mb_convert_encoding to convert it to Shift_JIS as well.
All in all, this works as expected.
So you set the internal encoding to UTF-8 while having translated
message catalog as Shift_JIS. Then which encoding are you using
primarily in your script? And is there any reason not to use UTF-8
everywhere applicable?
Now, we're trying to upgrade to php4.3.9 and I can find no easy way to
get the Shift_JIS to work.... in the existing setup, it would just
return UTF-8 or garbled characters. In other words,
mb_convert_encoding
was not doing it's job, and it wouldn't even display the Shift_JIS
language file entries. Manually converting the language file from
Shift_JIS characters to UTF-8 and then running all the elements through
mb_convert_encoding apparently did nothing as well -- unless I first
called mb_internal_encoding() and set that to Shift_JIS (likewise,
setting this in the php.ini file worked as well). Then, the characters
would be displayed correctly in Shift_JIS. I'm not sure if this is the
correct behavior though... it seems to me that the internal encoding
should almost always be UTF-8 and mb_convert_encoding should work
regardless of the internal encoding.
What settings did you put in your php.ini about mbstring?
Does it contain any part like "mbstring.language" comes after
"mbstring.internal_encoding"?
- http_input was set to UTF-8, SJIS, as was the detect_order.
Which encoding is used for the input data?
Regards,
Moriyoshi
--- End Message ---