php-i18n Digest 10 Sep 2007 11:14:22 -0000 Issue 369
Topics (messages 1119 through 1124):
Re: Regarding Collator Object Cleanup
1119 by: Stanislav Malyshev
weekly call
1120 by: Stanislav Malyshev
1121 by: Stanislav Malyshev
1122 by: tex
1123 by: tex
Difficulties using preg_replace with Latin9 and Unicode characters (Resolved)
1124 by: Erik Norvelle
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Both APIs use the same objects, they just use it in a different way - OO
functions get it as $this, while procedural ones get it as a parameter.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]
--- End Message ---
--- Begin Message ---
Will we be using the same call-in number and code as last time for the call?
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]
--- End Message ---
--- Begin Message ---
Oops, sorry... damn auto-complete.
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]
--- End Message ---
--- Begin Message ---
Yes
+1.888.371.8922 or +1.617.224.4792
Access code 26071426
tex
-----Original Message-----
From: Stanislav Malyshev [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 28, 2007 10:40 PM
To: [EMAIL PROTECTED]
Subject: [PHP-I18N] weekly call
Will we be using the same call-in number and code as last time for the call?
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED] http://www.zend.com/
(408)253-8829 MSN: [EMAIL PROTECTED]
--
PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit:
http://www.php.net/unsub.php
--- End Message ---
--- Begin Message ---
Hi so this went to the wrong list by mistake.
Please don't call in if you werent on these calls before.
Apologies for the noise.
Tex
tex wrote:
> Yes
>
Numbers and an access code
>
> tex
--- End Message ---
--- Begin Message ---
Greetings,
This was originally a question, but since I ended up solving the issue
on my own, I thought I would post my solution.
I have written a program for aiding in translating documents from
Spanish to English, which relies heavily on regular expressions. Mostly
it works, but there are a few characters which cause problems for the
regular expression engine. For instance, the following regular
expressions do not match correctly:
preg_replace("Dña\. ", "Ms\. ", $text); [Matches as /D.*/]
preg_replace("[Ff]útbol", "Soccer", $text); [Matches as /[Ff].*/]
preg_replace("1º", "1st", $text); [Matches as /1.*/]
Plus a few others. It appears that upon hitting one of these
troublesome characters, the preg engine stops parsing and uses whatever
"legal" characters it has found up to that point as the "real" regex,
ignoring whatever comes after.
I have tried saving the files in various encodings, in particular,
UTF-8, as well as the native Latin9 encoding, to see if PHP would pick
up the encoding and respond correctly. No luck, alas.
The regexes are stored in a MySQL database, with encoding
"utf8_unicode_ci", so in theory the function iconv should work to change
the encoding. I have tried the following:
$regex = iconv("UTF-8", "ISO-8859-1", $trans['patron']);
This should, in theory, change the pattern (stored in UTF-8 in the DB)
into a nice Latin1 pattern. However, it truncates the pattern, much as
PHP does automatically. For instance, "Sociedad Española de
Cardiología" becomes "Sociedad Espa", and "Dña." becomes "D", etc.
The solution was to tell MySQL to perform the conversion to Latin1 prior
to executing the SELECT query to retrieve the Regexes. MySQL does a
better job than PHP in translating between character sets, it would appear:
mysql_query("SET character_set_results=latin1");
This has fixed the problems that I had.
HTH,
Erik Norvelle
--- End Message ---