php-i18n Digest 10 Sep 2007 11:14:22 -0000 Issue 369

Topics (messages 1119 through 1124):

Re: Regarding Collator Object Cleanup
        1119 by: Stanislav Malyshev

weekly call
        1120 by: Stanislav Malyshev
        1121 by: Stanislav Malyshev
        1122 by: tex
        1123 by: tex

Difficulties using preg_replace with Latin9 and Unicode characters (Resolved)
        1124 by: Erik Norvelle

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message --- Both APIs use the same objects, they just use it in a different way - OO functions get it as $this, while procedural ones get it as a parameter.

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--- End Message ---
--- Begin Message ---
Will we be using the same call-in number and code as last time for the call?
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--- End Message ---
--- Begin Message ---
Oops, sorry... damn auto-complete.

--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--- End Message ---
--- Begin Message ---
Yes

+1.888.371.8922 or +1.617.224.4792 

Access code 26071426  

tex

-----Original Message-----
From: Stanislav Malyshev [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 28, 2007 10:40 PM
To: [EMAIL PROTECTED]
Subject: [PHP-I18N] weekly call

Will we be using the same call-in number and code as last time for the call?
--
Stanislav Malyshev, Zend Software Architect
[EMAIL PROTECTED]   http://www.zend.com/
(408)253-8829   MSN: [EMAIL PROTECTED]

--
PHP Unicode & I18N Mailing List (http://www.php.net/) To unsubscribe, visit:
http://www.php.net/unsub.php

--- End Message ---
--- Begin Message ---
Hi so this went to the wrong list by mistake.

Please don't call in if you werent on these calls before.

Apologies for the noise.
Tex
 
tex wrote:
> Yes
> 
Numbers and an access code
> 
> tex

--- End Message ---
--- Begin Message ---
Greetings,

This was originally a question, but since I ended up solving the issue on my own, I thought I would post my solution.

I have written a program for aiding in translating documents from Spanish to English, which relies heavily on regular expressions. Mostly it works, but there are a few characters which cause problems for the regular expression engine. For instance, the following regular expressions do not match correctly:

preg_replace("Dña\. ", "Ms\. ", $text); [Matches as /D.*/]
preg_replace("[Ff]útbol", "Soccer", $text); [Matches as /[Ff].*/]
preg_replace("1º", "1st", $text); [Matches as /1.*/]

Plus a few others. It appears that upon hitting one of these troublesome characters, the preg engine stops parsing and uses whatever "legal" characters it has found up to that point as the "real" regex, ignoring whatever comes after.

I have tried saving the files in various encodings, in particular, UTF-8, as well as the native Latin9 encoding, to see if PHP would pick up the encoding and respond correctly. No luck, alas. The regexes are stored in a MySQL database, with encoding "utf8_unicode_ci", so in theory the function iconv should work to change the encoding. I have tried the following:

$regex = iconv("UTF-8", "ISO-8859-1", $trans['patron']);

This should, in theory, change the pattern (stored in UTF-8 in the DB) into a nice Latin1 pattern. However, it truncates the pattern, much as PHP does automatically. For instance, "Sociedad Española de Cardiología" becomes "Sociedad Espa", and "Dña." becomes "D", etc.

The solution was to tell MySQL to perform the conversion to Latin1 prior to executing the SELECT query to retrieve the Regexes. MySQL does a better job than PHP in translating between character sets, it would appear:

mysql_query("SET character_set_results=latin1");

This has fixed the problems that I had.

HTH,
Erik Norvelle

--- End Message ---

Reply via email to