php-i18n Digest 14 Oct 2009 18:54:22 -0000 Issue 429

Topics (messages 1332 through 1334):

International characters look OK in PHP app, not in MySQL GUI
        1332 by: Michael R Boudreau
        1333 by: Darren Cook
        1334 by: Michael R Boudreau

Administrivia:

To subscribe to the digest, e-mail:
        [email protected]

To unsubscribe from the digest, e-mail:
        [email protected]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------
--- Begin Message ---
Hi all,

I have a problem with inconsistent display of international characters
between my PHP application and several MySQL GUI applications: the
characters look OK but sort incorrectly in the PHP app, and they look
scrambled in the MySQL GUI.

I'm developing a database of book and publisher information on Mac OS X
(10.6.1) with PHP 5.3 and MySQL 5.1.39.

The people using the application need to be able to enter book titles and
publisher names, among other things, in French, German, and other European
languages.

My PHP pages all include

   <meta http-equiv="content-type" content="text/html; charset=UTF-8" />

and the MySQL table holding the data uses UTF-8 as its default character
set.

When accented letters are entered using the Mac OS X "Special Characters"
palette or the Windows "Character Map" utility, or even copied and pasted
from a Word document, they go into the PHP app and are displayed back
correctly.

The problem in the PHP app is that they don't sort in the correct order. For
ex., capital E with any accent sorts between A and B.

When I use any GUI to view the MySQL database itself (such as CocoaMySQL or
Sequel Pro or MySQL Query Browser), the accented letters are represented by
_two_ completely different accented letters. (However, oddly, if I query the
database from the command-line 'mysql' utility in Mac OS X Terminal, the
letters look correct.)

Further, when I use the same system utilities (OS X Special Characters
palette or Windows Character Map) to edit values directly in the MySQL GUI,
they appear correct in the GUI but break my PHP app (I get an error message
from htmlspecialchars() about an illegal multibyte value).

I'd be grateful for any advice you can give.

-- 
Michael R. Boudreau
Senior Publishing Technology Analyst
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637
(773) 753-3298  fax: (773) 753-3383



--- End Message ---
--- Begin Message ---
> The problem in the PHP app is that they don't sort in the correct order. For
> ex., capital E with any accent sorts between A and B.

That's a curious symptom. I've no suggestion, but the other information
you didn't give was:

 a) do you have the mbstring extension installed (and if so, what is the
internal_encoding, and the other settings)?

 b) do you have the intl extension installed (and, again, what settings)?

 c) what is the server's locale setting?

The first two you can get from a phpinfo() page. For the 3rd I think
$_ENV["LANG"] is it, or this should confirm it:
  print_r(setlocale(LC_ALL,"0"));

Also take a look at  http://jp.php.net/setlocale  and try setting
"en_US.UTF8" explicitly at the top of your script. The comments say Mac
requires "UTF-8", so you might also want to try:
   setlocale(LC_ALL,Array("en_US.UTF8","en_US.UTF-8"));

(untested)

Darren

P.S. On the MySQL side have a look at the collation setting of the database.



-- 
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

--- End Message ---
--- Begin Message ---
Thanks for this, Darren.

It turns out that, although I had already explicitly set the character set and 
collation for the table I was updating/viewing, that wasn't enough.

I reset the MySQL server variables to use utf8 for the default character set, 
then dropped and recreated the database, explicitly setting the character set 
and collation. Then I reloaded the data. (Fortunately it wasn't a large 
database.)

That proved enough for the Mac OS X development environment. On a Solaris 
server, I had to go one step further: explicitly set the connection character 
set (using "SET NAMES 'utf8'") immediately after establishing the mysqli 
connection. Now all is well.

I'm a bit curious why the connection character set apparently defaulted to utf8 
on the Mac but not on Solaris. Perhaps the former is more utf8-friendly (or 
utf8-centric)?


On 10/12/09 6:37 PM, "Darren Cook" <[email protected]> wrote:

> The problem in the PHP app is that they don't sort in the correct order. For
> ex., capital E with any accent sorts between A and B.

That's a curious symptom. I've no suggestion, but the other information
you didn't give was:

 a) do you have the mbstring extension installed (and if so, what is the
internal_encoding, and the other settings)?

 b) do you have the intl extension installed (and, again, what settings)?

 c) what is the server's locale setting?

The first two you can get from a phpinfo() page. For the 3rd I think
$_ENV["LANG"] is it, or this should confirm it:
  print_r(setlocale(LC_ALL,"0"));

Also take a look at  http://jp.php.net/setlocale  and try setting
"en_US.UTF8" explicitly at the top of your script. The comments say Mac
requires "UTF-8", so you might also want to try:
   setlocale(LC_ALL,Array("en_US.UTF8","en_US.UTF-8"));

(untested)

Darren

P.S. On the MySQL side have a look at the collation setting of the database.



--
Darren Cook, Software Researcher/Developer
http://dcook.org/gobet/  (Shodan Go Bet - who will win?)
http://dcook.org/mlsn/ (Multilingual open source semantic network)
http://dcook.org/work/ (About me and my work)
http://dcook.org/blogs.html (My blogs and articles)

--
PHP Unicode & I18N Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




--
Michael R. Boudreau
Senior Publishing Technology Analyst
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637
(773) 753-3298  fax: (773) 753-3383


--- End Message ---

Reply via email to