php-i18n Digest 2 Feb 2006 18:36:43 -0000 Issue 307

Topics (messages 901 through 906):

Re: storing unicode in mysql database
        901 by: Tex Texin
        904 by: Naintara

Re: suggested tools for simple i18n
        902 by: Florian Breit
        903 by: Thorsten Ottosen

[I18N] PHP6 - Resuming Work
        905 by: l0t3k
        906 by: Michael Wallner

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [email protected]


----------------------------------------------------------------------
--- Begin Message ---
When you say the browser charset is set to utf-8, how are you doing that?

If the browser is converting the thai characters to numeric character
references (I assume that is what you mean by html codes: e.g. &#ddddd;
where d is a decimal digit) then most likely the page that is accepting user
data is not set to the right encoding.

Make sure the http protocol specifies the encoding is utf-8 and not iso
8859-1, or if the http protocol is not setting charset, then make sure the
web page is setting it to utf-8 (<meta http-equiv=Content-Type
content="text/html; charset=UTF-8">).

Tex Texin
Internationalization Architect,   Yahoo! Inc.
 
 


> -----Original Message-----
> From: Naintara [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 30, 2006 3:32 AM
> To: [email protected]
> Subject: [PHP-I18N] storing unicode in mysql database
> 
> 
>  
> Hi,
> 
> I'd like to know what would be the best way to store Unicode 
> text in a database. I am using MySQL 4.1. I am trying to 
> create a multi-lingual CMS and the browser charset is set to 
> utf-8 and the database and tables are set to UTF8 and 
> utf8_bin for charset and collation.
> 
> While displaying in the browser, Thai text is displayed 
> correctly but it is stored as html code in the database. Is 
> this correct behaviour or is there a better way? Would I need 
> to specify charset for every query, or is it enough to have 
> specified it for the mysql connection, results and client 
> charset options.
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.14.23/243 - Release 
> Date: 27-Jan-06
>  
> 
> -- 
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 

--- End Message ---
--- Begin Message ---
Thanks for your quick response.

I have set the meta tag in the header field to
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

I am submitting Thai text through a form, it is stored in a mysql database,
the database charset and collation appear to be correct. When retrieving the
text, the thai text is displayed correctly.

The problem is that the thai text gets converted to html entities, which
means that there is something amiss here. Even though the CMS is a web
application and html entities is fine for browser display, I would like to
follow the correct procedure so that the database text can even be read in
the correct language.

The strange behaviour is when testing on my local machine, the browser
(FireFox 1.5) shows the Encoding as "ISO-8859-1" and content type as utf-8. 
I changed the default encoding type to UTF-8 for the browser under the
content->advanced ->options but this doesn't not appear to have affected
anything. 

It could be that the encoding for the server needs to be set to utf-8 or
that I need to send headers to that effect.

I am running apache on windows (xampp) locally. Using phpmyadmin as the
database gui. Strangely enough the page encoding for the phpmyadmin pages
shows up as UTF-8.

I also found a very good resource on various php/mysql/browser issues
http://www.phpwact.org/php/i18n/charsets?s=utf8

It gets even stranger when I set the form encoding to
enctype="multipart/form-data" (which is the recommended setting for
submitting unicode characters). In this case, the thai text is neither
stored correctly nor displayed correctly, but are changed into accented
symbols. Yet, the very same accented characters are both stored and
displayed, there's no loss there.



-----Original Message-----
From: Tex Texin [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 30, 2006 5:41 PM
To: 'Naintara'; [email protected]
Subject: RE: [PHP-I18N] storing unicode in mysql database

When you say the browser charset is set to utf-8, how are you doing that?

If the browser is converting the thai characters to numeric character
references (I assume that is what you mean by html codes: e.g. &#ddddd;
where d is a decimal digit) then most likely the page that is accepting user
data is not set to the right encoding.

Make sure the http protocol specifies the encoding is utf-8 and not iso
8859-1, or if the http protocol is not setting charset, then make sure the
web page is setting it to utf-8 (<meta http-equiv=Content-Type
content="text/html; charset=UTF-8">).

Tex Texin
Internationalization Architect,   Yahoo! Inc.
 
 


> -----Original Message-----
> From: Naintara [mailto:[EMAIL PROTECTED] 
> Sent: Monday, January 30, 2006 3:32 AM
> To: [email protected]
> Subject: [PHP-I18N] storing unicode in mysql database
> 
> 
>  
> Hi,
> 
> I'd like to know what would be the best way to store Unicode 
> text in a database. I am using MySQL 4.1. I am trying to 
> create a multi-lingual CMS and the browser charset is set to 
> utf-8 and the database and tables are set to UTF8 and 
> utf8_bin for charset and collation.
> 
> While displaying in the browser, Thai text is displayed 
> correctly but it is stored as html code in the database. Is 
> this correct behaviour or is there a better way? Would I need 
> to specify charset for every query, or is it enough to have 
> specified it for the mysql connection, results and client 
> charset options.
> 
> -- 
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.14.23/243 - Release 
> Date: 27-Jan-06
>  
> 
> -- 
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
> 
> 



-- 
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.23/243 - Release Date: 27-Jan-06
 

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.14.23/243 - Release Date: 27-Jan-06
 

--- End Message ---
--- Begin Message ---
Hello Thorsten,

> Any reason you did not reply to the list?

This is just because the ezmlm at lists.php.net does not include a Return-To path and if I'm going to hit the reply button then it will just send the reply to the original author, not to the list. I often forget to send a copy to the list because for most others there is a Reply-To included.

>> Another nice solution is to include a file with an array of language
>> specific strings, as the people form phpBB did.
>> For example:
>
>
> right, but that won't work for japanese.

For me this will work with japanese also.
I just did a test as you can see at http://s01.w4h.biz/japanese-test.php

Here is my jp-jp.php:
<?php
$lang = array('lone wolf and cube' => '子連れ狼',
              'daigoro'  => '大五郎',
              'ogami itto' => '拝 一刀',
);
?>

And here is my japanese-test.php:
<?='<?xml version="1.0" encoding="UTF-8"?>'?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html>
  <head>
    <title>New Document</title>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  </head>
  <body>
    <pre>
      <?php
      include_once('jp-jp.php');
      var_dump($lang);
      ?>
    </pre>
  </body>
</html>

Output is:
array(3) {
  ["lone wolf and cube"]=>
  string(12) "子連れ狼"
  ["daigoro"]=>
  string(9) "大五郎"
  ["ogami itto"]=>
  string(10) "拝 一刀"
}

So, everything works correctly.
Maybe do you have had the wrong charset for the ouput or your editor doesn't support japanese?

Regards,
Florian Breit

--- End Message ---
--- Begin Message ---
Florian Breit wrote:

 >> Another nice solution is to include a file with an array of language
 >> specific strings, as the people form phpBB did.
 >> For example:
 >
 >
 > right, but that won't work for japanese.

For me this will work with japanese also.
I just did a test as you can see at http://s01.w4h.biz/japanese-test.php

Here is my jp-jp.php:
<?php
$lang = array('lone wolf and cube' => '子連れ狼',
              'daigoro'  => '大五郎',
              'ogami itto' => '拝 一刀',
);
?>
[snip]

So, everything works correctly.
Maybe do you have had the wrong charset for the ouput or your editor doesn't support japanese?

Both, it turns out.

Thanks!

-Thorsten

--- End Message ---
--- Begin Message ---
hello all,
 after a round of unfortunate incidents, i'll be ready to resume work on the 
unicode extension. i'm close to getting my  build environment back together, 
so i'd like to get some idea of where things are with other people working 
on the innards of PHP6. i'll start the discussion.:

    As some of you know, i have written an extension which exposes most of 
the interesting classes exported by ICU
(covering locales, formatting/parsing , calendars, timezones, collation, 
normalization,  regex etc). Having escaped my machine only once, i'd guess 
it is in early alpha shape. It compiled and worked on the pre-public CVS 
release, but it has not been tested (or compiled for that matter) on 
anything other than my machine (Win32/MSVC).

Derick attempted to give it a go, but ended in a sea of compiler errors 
(under gcc, i presume).  i think most errors are due to the mixture of C and 
C++ code. Other intrepid souls are welcome to try.

Over the next few days, i'll make changes to accomodate the PDM revisions. 
One issue Derick raised is that because of the sheer size of ICU, we may 
need to break the extension into more modular parts (each living in PECL 
until consensus is reached on which makes it to core). Because of this and 
the compile issue, i'd rather engage some eyes other than mine before 
attempting a CVS commit.


clayton
aka l0t3k 

--- End Message ---
--- Begin Message ---
l0t3k wrote:

As some of you know, i have written an extension which exposes most of the interesting classes exported by ICU (covering locales, formatting/parsing , calendars, timezones, collation, normalization, regex etc). Having escaped my machine only once, i'd guess it is in early alpha shape. It compiled and worked on the pre-public CVS release, but it has not been tested (or compiled for that matter) on anything other than my machine (Win32/MSVC).

Derick attempted to give it a go, but ended in a sea of compiler errors (under gcc, i presume). i think most errors are due to the mixture of C and C++ code. Other intrepid souls are welcome to try.

Over the next few days, i'll make changes to accomodate the PDM revisions. One issue Derick raised is that because of the sheer size of ICU, we may need to break the extension into more modular parts (each living in PECL until consensus is reached on which makes it to core). Because of this and the compile issue, i'd rather engage some eyes other than mine before attempting a CVS commit.

I'd really like to help out on that part!

Regards,
--
Michael - <mike(@)php.net> http://dev.iworks.at/ext-http/http-functions.html.gz

--- End Message ---

Reply via email to