php-i18n Digest 24 Apr 2008 18:16:10 -0000 Issue 386

Topics (messages 1159 through 1163):

Re: Problems with mime encoding of Japanese Characters in Subject and'From:' 
etc. fields.
        1159 by: Tomas Kuliavas
        1160 by: david.blomberg
        1161 by: Tomas Kuliavas
        1162 by: Andi Gutmans

Unicode escape sequences in PHP?
        1163 by: frank farmer

Administrivia:

To subscribe to the digest, e-mail:
        [EMAIL PROTECTED]

To unsubscribe from the digest, e-mail:
        [EMAIL PROTECTED]

To post to the list, e-mail:
        [EMAIL PROTECTED]


----------------------------------------------------------------------
--- Begin Message ---
> Hi, 
> 
> I try to send messages written in Japanese (Kana/Kanji) with php.
> 
> Everything works fine - only when the subject (or the name of the
> sender) becomes longer, there seems to be something wrong with the
> encoding: Neither my nor the mail reader of other Japanese friends is
> able to decode the mime string. At the place of the Japanese
> Characters, the mime string itself is displayed.
> 
> As this doesn't happen for other Japanese emails with even long
> subjects, I suppose I did something wrong...
> 
> When using the corresponding php mb_* functions to decode the string
> back, sometimes the correct original string and sometimes meaningless
> characters are shown.
> 
> Here how I convert the subject (the name is converted using the same
> method and the sources are saved in UTF-8 using emacs):
> 
>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>   ...snip...
>   mail($to, $subjectMIME, $bodyJIS, $headers);
> 
> Here part of the message as it is displayed by my mail program:
> 
>   From:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
>  <[EMAIL PROTECTED]>
>   ...snip...
>   Subject:
> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= 
> =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
>   ...snip...
...
> If anybody can explain me the problem I would be most gratefull :)

Bug in mb_encode_mimeheader. Function does not follow rfc2047, chapter
3, second paragraph. I suspect that function base64 encodes string first
and then splits it according to length argument or fails to add escapes
when texts in iso-2022 charsets are folded. It breaks iso-2022 escapes.

In http://bugs.php.net/bug.php?id=23192 [EMAIL PROTECTED] wrote that
issues should be reported on php-i18n first. Header posted on that bug
report shows same issue with broken iso-2022 escapes, but Moriyoshi
wrote that it is encoded correctly.

Are things unchanged since 2003-04? Do I have to report bug here or on
bugs.php.net?

-- 
Tomas

--- End Message ---
--- Begin Message ---
Tomas Kuliavas wrote:
Hi,
I try to send messages written in Japanese (Kana/Kanji) with php.

Everything works fine - only when the subject (or the name of the
sender) becomes longer, there seems to be something wrong with the
encoding: Neither my nor the mail reader of other Japanese friends is
able to decode the mime string. At the place of the Japanese
Characters, the mime string itself is displayed.

As this doesn't happen for other Japanese emails with even long
subjects, I suppose I did something wrong...

When using the corresponding php mb_* functions to decode the string
back, sometimes the correct original string and sometimes meaningless
characters are shown.

Here how I convert the subject (the name is converted using the same
method and the sources are saved in UTF-8 using emacs):

  $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
  $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
  ...snip...
  mail($to, $subjectMIME, $bodyJIS, $headers);

Here part of the message as it is displayed by my mail program:

  From:
=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
 <[EMAIL PROTECTED]>
  ...snip...
  Subject:
=?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?= 
=?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?= (B
...
If anybody can explain me the problem I would be most gratefull :)
I have seen this problem in a few mail clients My solution in the past has been to merge the 2 encoding strings into a single encoding string to avoid the client getting messed when it sees the second "=?ISO-2022-JP" in the Header line. (this is really a big problem for Apple iMail-I have seen it regardless of the programming language used)

You forgot to mention your PHP version, configure options related to
mbstring and php mbstring configuration.

Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
can't you just send it in Base64 encoded UTF-8?

Some brain dead ISPs/Mobile services here in Japan only support ISO-2022-JP.

David Blomberg


--- End Message ---
--- Begin Message ---
>>> Hi,
>>> I try to send messages written in Japanese (Kana/Kanji) with php.
>>>
>>> Everything works fine - only when the subject (or the name of the
>>> sender) becomes longer, there seems to be something wrong with the
>>> encoding: Neither my nor the mail reader of other Japanese friends is
>>> able to decode the mime string. At the place of the Japanese
>>> Characters, the mime string itself is displayed.
>>>
>>> As this doesn't happen for other Japanese emails with even long
>>> subjects, I suppose I did something wrong...
>>>
>>> When using the corresponding php mb_* functions to decode the string
>>> back, sometimes the correct original string and sometimes meaningless
>>> characters are shown.
>>>
>>> Here how I convert the subject (the name is converted using the same
>>> method and the sources are saved in UTF-8 using emacs):
>>>
>>>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
>>>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
>>>   ...snip...
>>>   mail($to, $subjectMIME, $bodyJIS, $headers);
>>>
>>> Here part of the message as it is displayed by my mail program:
>>>
>>>   From:
>>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
>>> <[EMAIL PROTECTED]>
>>>   ...snip...
>>>   Subject:
>>> =?ISO-2022-JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=
>>> =?ISO-2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
>>> (B
>> ...
>>> If anybody can explain me the problem I would be most gratefull :)
> I have seen this problem in a few mail clients My solution in the past
> has been to merge the 2 encoding strings into a single encoding string
> to avoid the client getting messed when it sees the second
> "=?ISO-2022-JP" in the Header line. (this is really a big problem for
> Apple iMail-I have seen it regardless of the programming language used)

Again RFC2047.
---
An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters.
---

>>
>> You forgot to mention your PHP version, configure options related to
>> mbstring and php mbstring configuration.
>>
>> Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
>> can't you just send it in Base64 encoded UTF-8?
>>
> Some brain dead ISPs/Mobile services here in Japan only support
> ISO-2022-JP.

Do they need another four black ships in order to change things?

ISO-2022 texts can be encoded correctly, but it is harder to implement
than iso-8859 or utf-8/utf-16 mime encoding. I suggest sending text in
utf-8 and asking brain dead ISPs to fix their software. Even if it is
DoCoMo. If Dietrich uses script in some html form, he does not know if
text submitted in that form is Japanese.

Instead of
----
$subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
$subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
----
do
----
mb_internal_encoding('utf-8');
$subjectMIME = mb_encode_mimeheader($subject, "utf-8", "B");
----

-- 
Tomas

--- End Message ---
--- Begin Message ---
Unrelated - We are looking for people who will contribute unit tests to PHP 5.3 
for ext/mbstring esp. input encoding coversion (Shift-JIS, etc..). Any 
volunteers please contact internals@

Andi

> -----Original Message-----
> From: Tomas Kuliavas [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 25, 2008 8:49 PM
> To: [EMAIL PROTECTED]
> Subject: Re: [PHP-I18N] Re: Problems with mime encoding of Japanese
> Charactersin Subject and'From:' etc. fields.
> 
> >>> Hi,
> >>> I try to send messages written in Japanese (Kana/Kanji) with php.
> >>>
> >>> Everything works fine - only when the subject (or the name of the
> >>> sender) becomes longer, there seems to be something wrong with the
> >>> encoding: Neither my nor the mail reader of other Japanese friends
> is
> >>> able to decode the mime string. At the place of the Japanese
> >>> Characters, the mime string itself is displayed.
> >>>
> >>> As this doesn't happen for other Japanese emails with even long
> >>> subjects, I suppose I did something wrong...
> >>>
> >>> When using the corresponding php mb_* functions to decode the
> string
> >>> back, sometimes the correct original string and sometimes
> meaningless
> >>> characters are shown.
> >>>
> >>> Here how I convert the subject (the name is converted using the
> same
> >>> method and the sources are saved in UTF-8 using emacs):
> >>>
> >>>   $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP",
> "AUTO");
> >>>   $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP",
> "B");
> >>>   ...snip...
> >>>   mail($to, $subjectMIME, $bodyJIS, $headers);
> >>>
> >>> Here part of the message as it is displayed by my mail program:
> >>>
> >>>   From:
> >>> =?ISO-2022-
> JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?==?ISO-
> 2022-JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=(B
> >>> <[EMAIL PROTECTED]>
> >>>   ...snip...
> >>>   Subject:
> >>> =?ISO-2022-
> JP?B?GyRCJCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7?=
> >>> =?ISO-2022-
> JP?B?eiQrJEo0QTt6JCskSjRBO3okKyRKNEE7eiQrJEo0QTt6JCskSjRBO3ob?=
> >>> (B
> >> ...
> >>> If anybody can explain me the problem I would be most gratefull :)
> > I have seen this problem in a few mail clients My solution in the
> past
> > has been to merge the 2 encoding strings into a single encoding
> string
> > to avoid the client getting messed when it sees the second
> > "=?ISO-2022-JP" in the Header line. (this is really a big problem for
> > Apple iMail-I have seen it regardless of the programming language
> used)
> 
> Again RFC2047.
> ---
> An 'encoded-word' may not be more than 75 characters long, including
> 'charset', 'encoding', 'encoded-text', and delimiters.
> ---
> 
> >>
> >> You forgot to mention your PHP version, configure options related to
> >> mbstring and php mbstring configuration.
> >>
> >> Could you explain why Japanese are so obsessed with ISO-2022-JP? Why
> >> can't you just send it in Base64 encoded UTF-8?
> >>
> > Some brain dead ISPs/Mobile services here in Japan only support
> > ISO-2022-JP.
> 
> Do they need another four black ships in order to change things?
> 
> ISO-2022 texts can be encoded correctly, but it is harder to implement
> than iso-8859 or utf-8/utf-16 mime encoding. I suggest sending text in
> utf-8 and asking brain dead ISPs to fix their software. Even if it is
> DoCoMo. If Dietrich uses script in some html form, he does not know if
> text submitted in that form is Japanese.
> 
> Instead of
> ----
> $subjectJIS  = mb_convert_encoding($subject, "ISO-2022-JP", "AUTO");
> $subjectMIME = mb_encode_mimeheader($subjectJIS, "ISO-2022-JP", "B");
> ----
> do
> ----
> mb_internal_encoding('utf-8');
> $subjectMIME = mb_encode_mimeheader($subject, "utf-8", "B");
> ----
> 
> --
> Tomas
> 
> --
> PHP Unicode & I18N Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php


--- End Message ---
--- Begin Message ---
Hi all, I have a question regarding unicode character escape sequences:

I need to match char 0x2022 (bullet).

PHP's \x does not support multibyte chars.
PCRE (e.g. preg_match) supports \u2022, but only on platforms where the PCRE lib has been compiled with a certain flag.

Is there an escape sequence I can use with mb_ereg that will match this character?

Thanks,
Frank

--- End Message ---

Reply via email to