php-i18n Digest 2 Feb 2006 18:36:43 -0000 Issue 307
Topics (messages 901 through 906):
Re: storing unicode in mysql database
901 by: Tex Texin
904 by: Naintara
Re: suggested tools for simple i18n
902 by: Florian Breit
903 by: Thorsten Ottosen
[I18N] PHP6 - Resuming Work
905 by: l0t3k
906 by: Michael Wallner
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[email protected]
----------------------------------------------------------------------
--- Begin Message ---
When you say the browser charset is set to utf-8, how are you doing that?
If the browser is converting the thai characters to numeric character
references (I assume that is what you mean by html codes: e.g. &#ddddd;
where d is a decimal digit) then most likely the page that is accepting user
data is not set to the right encoding.
Make sure the http protocol specifies the encoding is utf-8 and not iso
8859-1, or if the http protocol is not setting charset, then make sure the
web page is setting it to utf-8 (<meta http-equiv=Content-Type
content="text/html; charset=UTF-8">).
Tex Texin
Internationalization Architect, Yahoo! Inc.
> -----Original Message-----
> From: Naintara [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 30, 2006 3:32 AM
> To: [email protected]
> Subject: [PHP-I18N] storing unicode in mysql database
>
>
>
> Hi,
>
> I'd like to know what would be the best way to store Unicode
> text in a database. I am using MySQL 4.1. I am trying to
> create a multi-lingual CMS and the browser charset is set to
> utf-8 and the database and tables are set to UTF8 and
> utf8_bin for charset and collation.
>
> While displaying in the browser, Thai text is displayed
> correctly but it is stored as html code in the database. Is
> this correct behaviour or is there a better way? Would I need
> to specify charset for every query, or is it enough to have
> specified it for the mysql connection, results and client
> charset options.
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.14.23/243 - Release
> Date: 27-Jan-06
>
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--- End Message ---
--- Begin Message ---
Thanks for your quick response.
I have set the meta tag in the header field to
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
I am submitting Thai text through a form, it is stored in a mysql database,
the database charset and collation appear to be correct. When retrieving the
text, the thai text is displayed correctly.
The problem is that the thai text gets converted to html entities, which
means that there is something amiss here. Even though the CMS is a web
application and html entities is fine for browser display, I would like to
follow the correct procedure so that the database text can even be read in
the correct language.
The strange behaviour is when testing on my local machine, the browser
(FireFox 1.5) shows the Encoding as "ISO-8859-1" and content type as utf-8.
I changed the default encoding type to UTF-8 for the browser under the
content->advanced ->options but this doesn't not appear to have affected
anything.
It could be that the encoding for the server needs to be set to utf-8 or
that I need to send headers to that effect.
I am running apache on windows (xampp) locally. Using phpmyadmin as the
database gui. Strangely enough the page encoding for the phpmyadmin pages
shows up as UTF-8.
I also found a very good resource on various php/mysql/browser issues
http://www.phpwact.org/php/i18n/charsets?s=utf8
It gets even stranger when I set the form encoding to
enctype="multipart/form-data" (which is the recommended setting for
submitting unicode characters). In this case, the thai text is neither
stored correctly nor displayed correctly, but are changed into accented
symbols. Yet, the very same accented characters are both stored and
displayed, there's no loss there.
-----Original Message-----
From: Tex Texin [mailto:[EMAIL PROTECTED]
Sent: Monday, January 30, 2006 5:41 PM
To: 'Naintara'; [email protected]
Subject: RE: [PHP-I18N] storing unicode in mysql database
When you say the browser charset is set to utf-8, how are you doing that?
If the browser is converting the thai characters to numeric character
references (I assume that is what you mean by html codes: e.g. &#ddddd;
where d is a decimal digit) then most likely the page that is accepting user
data is not set to the right encoding.
Make sure the http protocol specifies the encoding is utf-8 and not iso
8859-1, or if the http protocol is not setting charset, then make sure the
web page is setting it to utf-8 (<meta http-equiv=Content-Type
content="text/html; charset=UTF-8">).
Tex Texin
Internationalization Architect, Yahoo! Inc.
> -----Original Message-----
> From: Naintara [mailto:[EMAIL PROTECTED]
> Sent: Monday, January 30, 2006 3:32 AM
> To: [email protected]
> Subject: [PHP-I18N] storing unicode in mysql database
>
>
>
> Hi,
>
> I'd like to know what would be the best way to store Unicode
> text in a database. I am using MySQL 4.1. I am trying to
> create a multi-lingual CMS and the browser charset is set to
> utf-8 and the database and tables are set to UTF8 and
> utf8_bin for charset and collation.
>
> While displaying in the browser, Thai text is displayed
> correctly but it is stored as html code in the database. Is
> this correct behaviour or is there a better way? Would I need
> to specify charset for every query, or is it enough to have
> specified it for the mysql connection, results and client
> charset options.
>
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.14.23/243 - Release
> Date: 27-Jan-06
>
>
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.23/243 - Release Date: 27-Jan-06
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.23/243 - Release Date: 27-Jan-06
--- End Message ---
--- Begin Message ---
Hello Thorsten,
> Any reason you did not reply to the list?
This is just because the ezmlm at lists.php.net does not include a
Return-To path and if I'm going to hit the reply button then it will
just send the reply to the original author, not to the list. I often
forget to send a copy to the list because for most others there is a
Reply-To included.
>> Another nice solution is to include a file with an array of language
>> specific strings, as the people form phpBB did.
>> For example:
>
>
> right, but that won't work for japanese.
For me this will work with japanese also.
I just did a test as you can see at http://s01.w4h.biz/japanese-test.php
Here is my jp-jp.php:
<?php
$lang = array('lone wolf and cube' => '子連れ狼',
'daigoro' => '大五郎',
'ogami itto' => '拝 一刀',
);
?>
And here is my japanese-test.php:
<?='<?xml version="1.0" encoding="UTF-8"?>'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>New Document</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
</head>
<body>
<pre>
<?php
include_once('jp-jp.php');
var_dump($lang);
?>
</pre>
</body>
</html>
Output is:
array(3) {
["lone wolf and cube"]=>
string(12) "子連れ狼"
["daigoro"]=>
string(9) "大五郎"
["ogami itto"]=>
string(10) "拝 一刀"
}
So, everything works correctly.
Maybe do you have had the wrong charset for the ouput or your editor
doesn't support japanese?
Regards,
Florian Breit
--- End Message ---
--- Begin Message ---
Florian Breit wrote:
>> Another nice solution is to include a file with an array of language
>> specific strings, as the people form phpBB did.
>> For example:
>
>
> right, but that won't work for japanese.
For me this will work with japanese also.
I just did a test as you can see at http://s01.w4h.biz/japanese-test.php
Here is my jp-jp.php:
<?php
$lang = array('lone wolf and cube' => '子連れ狼',
'daigoro' => '大五郎',
'ogami itto' => '拝 一刀',
);
?>
[snip]
So, everything works correctly.
Maybe do you have had the wrong charset for the ouput or your editor
doesn't support japanese?
Both, it turns out.
Thanks!
-Thorsten
--- End Message ---
--- Begin Message ---
hello all,
after a round of unfortunate incidents, i'll be ready to resume work on the
unicode extension. i'm close to getting my build environment back together,
so i'd like to get some idea of where things are with other people working
on the innards of PHP6. i'll start the discussion.:
As some of you know, i have written an extension which exposes most of
the interesting classes exported by ICU
(covering locales, formatting/parsing , calendars, timezones, collation,
normalization, regex etc). Having escaped my machine only once, i'd guess
it is in early alpha shape. It compiled and worked on the pre-public CVS
release, but it has not been tested (or compiled for that matter) on
anything other than my machine (Win32/MSVC).
Derick attempted to give it a go, but ended in a sea of compiler errors
(under gcc, i presume). i think most errors are due to the mixture of C and
C++ code. Other intrepid souls are welcome to try.
Over the next few days, i'll make changes to accomodate the PDM revisions.
One issue Derick raised is that because of the sheer size of ICU, we may
need to break the extension into more modular parts (each living in PECL
until consensus is reached on which makes it to core). Because of this and
the compile issue, i'd rather engage some eyes other than mine before
attempting a CVS commit.
clayton
aka l0t3k
--- End Message ---
--- Begin Message ---
l0t3k wrote:
As some of you know, i have written an extension which exposes most of
the interesting classes exported by ICU
(covering locales, formatting/parsing , calendars, timezones, collation,
normalization, regex etc). Having escaped my machine only once, i'd guess
it is in early alpha shape. It compiled and worked on the pre-public CVS
release, but it has not been tested (or compiled for that matter) on
anything other than my machine (Win32/MSVC).
Derick attempted to give it a go, but ended in a sea of compiler errors
(under gcc, i presume). i think most errors are due to the mixture of C and
C++ code. Other intrepid souls are welcome to try.
Over the next few days, i'll make changes to accomodate the PDM revisions.
One issue Derick raised is that because of the sheer size of ICU, we may
need to break the extension into more modular parts (each living in PECL
until consensus is reached on which makes it to core). Because of this and
the compile issue, i'd rather engage some eyes other than mine before
attempting a CVS commit.
I'd really like to help out on that part!
Regards,
--
Michael - <mike(@)php.net> http://dev.iworks.at/ext-http/http-functions.html.gz
--- End Message ---