php-i18n Digest 27 Jun 2007 17:36:48 -0000 Issue 358

Topics (messages 1070 through 1073):

Re: Inserting UTF-8 Japanese into MySQL: phpMyAdmin works, my form doesn't.
        1070 by: Wil Clouser

Inserting Japanese into MySQL: It almost works now...
        1071 by: Richard Pavonarius

UTF-8 Japanese goes into MySQL OK, comes out question marks
        1072 by: Richard Pavonarius

Check if PCRE has UTF-8 support compiled in
        1073 by: Andries Seutens

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
It looks like your string is getting UTF encoded twice.  If MySQL
doesn't think the incoming data is in UTF-8, it'll encode it for you
(possibly duplicating it).  You can check out the character sets in
MySQL by running (from your app - not the cli):

SHOW VARIABLES LIKE 'char%';

To test if my theory is correct, run this command before your queries:

SET NAMES 'utf8';

That will tell MySQL that the incoming data is utf8.  If your string
works after your run that query, you should change the default
character set in your .conf file.  More info here:
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

Wil

On 5/13/07, Richard Pavonarius <[EMAIL PROTECTED]> wrote:
Using MySQL 4.1.3-beta-standard, PHP Version 4.4.6. I've also tried
PHP Version 5.2.2 with exactly the same results.

I have a table with utf8_general_ci collation. If I use phpMyAdmin, I
can insert Japanese text into it with no problem.

However, a simple form I've created is inserting mangled Japanese text
into MySQL.

My form's page has a meta tag setting the charset to UTF-8.
After the form has been submitted, I can echo the Japanese text to the
feedback page just fine, so I assume the data is being sent properly
encoded.

高橋アントニオ gets put into MySQL from my form as 高æ(c)‹ã‚¢ãƒ³ãƒˆãƒ‹ã‚ª
But as I said, I can insert the same text via phpMyAdmin without any problem.

The php code is as simple as I can make it

$username = 'tony';
$password = 'emmauspa';
$name = '高橋アントニオ';
$email = '[EMAIL PROTECTED]';
$gender = 'M';
$country = '4';
$nationality = '4';
$language = '1';
$dob = '1994-01-04';
$phone = '967-1261';
$connection = mysql_connect('localhost', 'user', 'password') or die
('Unable to connect!');
mysql_select_db('gdbase') or die ('Unable to select database!');
$query="INSERT INTO members (username, password, name, email, gender,
country, nationality, language, dob, phone)
VALUES('$username','$password','$name','$email','$gender',$country,$nationality,$language,'$dob','$phone')";
$result = mysql_query($query) or die ("Error in query: $query. " .
mysql_error());
echo 'New record inserted with ID' . mysql_insert_id() . '<br />';
echo mysql_affected_rows() . 'records(s) affected.';
mysql_close($connection);

My mbstring settings in php.ini:

mbstring.detect_order   = auto
mbstring.encoding_translation   = On
mbstring.func_overload  = 0
mbstring.http_input     = auto
mbstring.http_output    = UTF-8
mbstring.internal_encoding      = UTF-8
mbstring.language = Neutral
mbstring.substitute_character   = no value

What the heck am I doing wrong, or how can I figure out where the
problem is? I tried to follow the insertion routine in phpMyAdmin, but
the code is way over my head. I've been at this for days and days.

Thanks,

Rich


--- End Message ---
--- Begin Message ---
I can get it to work by adding this:

--> $query = "SET NAMES 'utf8'";
--> $result = mysql_query($query) or die ("Error in query: $query. " .
mysql_error());
$query="INSERT INTO members (username, password, name, email, gender,
country, nationality, language, dob, phone)
VALUES('$username','$password','$name','$email','$gender',$country,$nationality,$language,'$dob','$phone')";
$result = mysql_query($query) or die ("Error in query: $query. " .
mysql_error());

...but I don't want to have to SET NAMES all the time.

I made the following changes to /etc/my.cnf

[client]
default-character-set=utf8

[mysqld]
init-connect=SET NAMES 'utf8'
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8

I can see in phpMyAdmin that the changes have been made to the system
variables, but still the only way it works is to send the SET NAMES
'utf8' query via PHP. What am I missing?

Thanks,

Rich

--- End Message ---
--- Begin Message ---
OK, this is officially driving me nuts.

In phpMyAdmin, I can see the Japanese text in the db. By sending the
query SET NAMES 'utf8' before inserting data, I can get my own PHP
scripts to input correctly.

However, in my PHP scripts the Japanese text is coming out of the DB
as question marks. Static Japanese text on the page is OK, so it's not
a problem with the browser font.

PHP 5.2.2, MySQL 4.13-beta-standard

php.ini:
mbstring.detect_order = auto
mbstring.encoding_translation   = On
mbstring.func_overload  = 0
mbstring.http_input = auto
mbstring.http_output  = UTF-8
mbstring.internal_encoding = UTF-8
mbstring.language = Neutral
mbstring.script_encoding  = none
mbstring.strict_detection   = Off
mbstring.substitute_character = none

/etc/my.cfg
[client]
default-character-set=utf8

[mysqld]
init-connect=SET NAMES 'utf8'
collation_server=utf8_unicode_ci
character_set_server=utf8
default-character-set=utf8

Thanks,

Rich

--- End Message ---
--- Begin Message ---
Dear all,

In the Zend Framework we have written a filter to filter anything but alpha 
an numeric characters from a string. The regular expression for this filter 
looks like so:

$result = preg_replace('/[^\p{L}\p{N}\s]/u', '', '!!testing123!!');

Apperently, on some systems like RHEL however, this regular expression seems 
to fail.  The reason that it is failing for you appears to be that your PCRE 
does not have UTF-8 support compiled in?  This is likely the result of the 
particular distribution not having used the --enable-utf8 option.

What this means to us is that the class should become aware of whether such 
support is available, and act accordingly.

My question to you guys is, how would such a check for support best be done? 
Perhaps detect the error condition, which would only occur in the event that 
the system does not have utf-8 support? But how (preferably without going 
down the dirty path .... )?

Best regards,


Andries Seutens
http://andries.systray.be 

--- End Message ---

Reply via email to