php-i18n Digest 16 Oct 2007 16:22:36 -0000 Issue 370

Topics (messages 1125 through 1126):

Re: Difficulties using preg_replace with Latin9 and Unicode characters 
(Resolved)
        1125 by: Andrei Zmievski

Free unlimied hosting for you!
        1126 by: dungdm001

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message --- You don't seem to be using any delimiters in your regexes. No wonder it doesn't work.

As for UTF-8, you can pass those strings to preg_replace(), but you need to use /u modifier. Like this:

 preg_replace("/[Ff]útbol/u", "Soccer", $text);

-Andrei
http://10fathoms.org/vu - daily photoblog

Erik Norvelle wrote:
Greetings,

This was originally a question, but since I ended up solving the issue on my own, I thought I would post my solution.

I have written a program for aiding in translating documents from Spanish to English, which relies heavily on regular expressions. Mostly it works, but there are a few characters which cause problems for the regular expression engine. For instance, the following regular expressions do not match correctly:

preg_replace("Dña\. ", "Ms\. ", $text); [Matches as /D.*/]
preg_replace("[Ff]útbol", "Soccer", $text); [Matches as /[Ff].*/]
preg_replace("1º", "1st", $text); [Matches as /1.*/]

Plus a few others. It appears that upon hitting one of these troublesome characters, the preg engine stops parsing and uses whatever "legal" characters it has found up to that point as the "real" regex, ignoring whatever comes after.

I have tried saving the files in various encodings, in particular, UTF-8, as well as the native Latin9 encoding, to see if PHP would pick up the encoding and respond correctly. No luck, alas. The regexes are stored in a MySQL database, with encoding "utf8_unicode_ci", so in theory the function iconv should work to change the encoding. I have tried the following:

$regex = iconv("UTF-8", "ISO-8859-1", $trans['patron']);

This should, in theory, change the pattern (stored in UTF-8 in the DB) into a nice Latin1 pattern. However, it truncates the pattern, much as PHP does automatically. For instance, "Sociedad Española de Cardiología" becomes "Sociedad Espa", and "Dña." becomes "D", etc.

The solution was to tell MySQL to perform the conversion to Latin1 prior to executing the SELECT query to retrieve the Regexes. MySQL does a better job than PHP in translating between character sets, it would appear:

mysql_query("SET character_set_results=latin1");

This has fixed the problems that I had.

HTH,
Erik Norvelle


--- End Message ---
--- Begin Message ---
Space: 250 MB Ads (Banner). Unlimited bandwidth
Upload: FTP, Browser Scripting: PHP
Other Features: MySQL databases, PhpMyAdmin. AHCS c-panel. All file types
allowed. No file size limit. POP3, Web-based Email. 30-days Inactivity
limit. Automatic scripts installer.PHP flags manager. Free web hosting
instant activation. Addon domains. Same company and servers as ByetHost.
http://www.boy.us.com/freehosting.php

-- 
View this message in context: 
http://www.nabble.com/Free-unlimied-hosting-for-you%21-tf4635232.html#a13237048
Sent from the Php - Internationalization (i18n) mailing list archive at 
Nabble.com.

--- End Message ---

Reply via email to