php-i18n Digest 21 Oct 2004 15:57:40 -0000 Issue 254
Topics (messages 770 through 772):
Unicode internationalization problem with e-commerce site
770 by: Mattias H�kansson
771 by: Christophe Chisogne
Re: Bakemoji in From header line for Japanese e-mail sent with PHP mb_send_mail
772 by: Moriyoshi Koizumi
Administrivia:
To subscribe to the digest, e-mail:
[EMAIL PROTECTED]
To unsubscribe from the digest, e-mail:
[EMAIL PROTECTED]
To post to the list, e-mail:
[EMAIL PROTECTED]
----------------------------------------------------------------------
--- Begin Message ---
Hello,
I am experiencing some problems with internationalization and I can't figure out if
there's a global solution to this issue or if independent application tweaking is
needed. These are my examples
Server config:
Linux dev 2.4.21-20.ELsmp, Red Hat Enterprise Linux ES release 3 (Taroon Update 3)
PHP Version 4.3.2
MySQL Server 4.0.20
Apache 2.0.46
Example 1: selecting unicode chars from mysql client
mysql> select * from test;
+--------+------+
| testID | name |
+--------+------+
| 1 | ��� | (a with a dot, a with two dots and o with two dots if this should
not be diplayed properly)
+--------+------+
1 row in set (0.00 sec)
Then i tried to get it out from php
$query = "SELECT * FROM test";
$dbHost = new SQL();
$dbHost->query($query);
$num = $dbHost->getNumRows();
for($i = 0;$i <= $num; $i++)
{
$row = $dbHost->getRow($i);
echo $row->name ."<br>";
}
The browser displays ??? instead.
Example 2: fetching unicode data from the database and writing it to a file on the
disk.
select cCityname from blabla where cCode = 'PYASU';
+-----------+
| cCityname |
+-----------+
| Asunci�n |
+-----------+
1 row in set (0.02 sec)
Notice the forward tick on the o
Then I change the query in example 1 to select this cCityname and I get shown Asunci?n
So instead of displaying it in the browser I instead write the variable to a file,
using fopen() and fwrite().
And when i look at the file it has Asunci�n in it So that seems to work.
I've been playing around with these two switches in httpd.conf but it doesn't seem to
change anything:
AddDefaultCharset off
AddDefaultCharset UTF-8
Anyone had some similar problems and know what can fix this?
Thanks,
Mattias
--- End Message ---
--- Begin Message ---
Mattias H�kansson wrote:
I am experiencing some problems with internationalization and I can't figure out if there's a global solution to this issue or if independent application tweaking is needed. These are my examples
You'll probably have to update your application somewhat.
First choose one encoding
- either iso-8859-1 aka latin1 (us, western europe) : lot easier if you can
- or utf-8 (Unicode encoding), if you need "wide" chars too (Chinese etc).
These 2 are different even for basic 8bit chars.
Ex for french char '�' (é) :
it's one byte (0xE9) in latin 1 but 2 bytes (0xC3A9) in utf-8
Then make sure _all_ your tools/apps/scripts use same encoding.
Lots easier too, even if transcoding can be possible.
Ex below for latin1
PS some caveat : the Euro sign (at least represented 8bits) isnt defined in latin1
Neither are chars in 0x80-0x9F. Ex word 'magic quotes' are invalid latin1,
but are defined in cp1252 charset. Some translation can be needed.
1. MySQL uses latin1 for it's encoding.
2. Apache sends latin1 by default
3. Your application input/output uses latin1
Ex define charset in generated html pages too (in meta tag)
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
4 NB php can use wide chars, with "mbstrings" functions. See php manual on this
The browser displays ??? instead.
Browser perhaps cant display char because of font problems,
or because of charset problems (utf-8 chars seen as latin1 chars, or reverse)
AddDefaultCharset off
AddDefaultCharset UTF-8
I dont use AddDefaultCharset, as Apache defaults to latin1
and I want only latin1 (we dont need for utf-8 chars)
But if you need to send different charset from your http server,
you probably want to use "AddDefaultCharset off" and make your
web applications send the correct charset.
Also note issues with old browsers. Ex Netscape 4 has some Unicode support,
but sending the euro char as 1 latin1 byte (0x80) dont work : it's invalid
latin. You can use html entities like € or Unicode hexadecimal entities
(decimal dont work in NN4), in this case "&x20AC;" iirc
Of course, translations between '€', "&x20AC", 0x80, 0x20AC are up to you.
Your DB server wont know if it's datas are html entities or raw utf-8 chars,
by example.
Hope this helps you understanding (thus solving) your charset problems
Christophe
--- End Message ---
--- Begin Message ---
Perhaps you haven't just set mbstring.language to "Japanese" in your
php.ini. Pass strings encoded in the same encoding as specified in
mbstring.internal_encoding to mb_send_mail(). Basically no additional
preparation should be needed.
HTH,
Moriyoshi
On 2004/10/19, at 22:01, WillaDee Young wrote:
I'm trying send Japanese e-mail from a PHP web site. I am using
mb_send_mail for this task. The back end is MySQL and the whole system
is set up to use EUC-JP internally and SJIS for output. I have a form
prepopulated with some suggested Japanese text, the user edits that,
and the mail is sent.
The form fields are (1) e-mail subject, (2) sender name, (3) sender
e-mail address, and (4) message body. The sender name is concatenated
with an angle-bracketed version of the sender e-mail address
(separated by a hankaku space) and used as the e-mail From field. The
subject is used as the subject and the message body is the message
body.
No problems with the subject and the message body. The sender (the
From line) however is bakemoji. I tried using mb_convert_encoding and
converting to ISO-2022-JP and got very weird results, almost all of
the kanji went through okay, but the number of characters were
increased and the extra characters were bakemoji.
Shouldn't mb_send_mail just be taking care of everything, or do I need
to be doing something to prepare text for it? Is there a problem with
concatenating the kanji name and the ASCII email address? Is there any
way the kanji name could be in a different encoding than the subject
and message body, even though it comes from the same form? How can I
troubleshoot this? If the From line is bakemoji, is there a risk that
the Subject and message body will be bakemoji in some mail readers?
Any help appreciated.
--
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
--- End Message ---