ID:               44014
 Comment by:       d_kelsey at uk dot ibm dot com
 Reported By:      michael202 at gmx dot de
 Status:           No Feedback
 Bug Type:         mbstring related
 Operating System: Win XP
 PHP Version:      5.2.5
 Assigned To:      hirokawa
 New Comment:

My understanding of UTF-16 is that the BOM is a mandatory. For mbstring
I have found that if I input a UTF-16 string for conversion in
mb_convert_encoding for example to UTF-8, it treats the BOM as UTF-16
data and converts it.

MBString doesn't generate the BOM when converting to UTF-16, so as I
thought the BOM was mandatory, it isn't generating valid UTF-16 bytes.

I see that MBString uses UTF-16BE effectively when you specify UTF-16.

If mbstring doesn't support BOM then UTF-16 cannot be handled properly.
Should this at least be documented and recommend considering using
UTF-16BE as the encoding so that you are explicit in what is supportable
?


Previous Comments:
------------------------------------------------------------------------

[2008-02-24 01:00:00] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

------------------------------------------------------------------------

[2008-02-16 12:17:13] [EMAIL PROTECTED]

BOM of Unicode is not supported by encoding conversion function 
in mbstring.

And big endian is default in UTF-16. Please specify 'UTF-16LE'
if you need to specify little endian format.

Try,

<?php
$utf16 = chr(0).chr(0x4d).chr(0).chr(0x6f); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16'); 
echo($utf8 . "\n");     // -> Mo
?>

or

<?php
$utf16 = chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16LE'); 
echo($utf8 . "\n");     // -> Mo
?>


------------------------------------------------------------------------

[2008-02-05 05:10:37] [EMAIL PROTECTED]

Assigned to the mbstring maintainer.

------------------------------------------------------------------------

[2008-02-01 12:08:07] michael202 at gmx dot de

Description:
------------
mb_convert_encoding 'destroys' first character when
converting from UTF16 to UTF8

(iconv works).

Reproduce code:
---------------
$utf16 = chr(0xFF).chr(0xFE).chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'

$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');  

echo($utf8 . "\n");     // -> ´++´¢ìo

$utf8 = iconv('UTF-16', 'UTF-8', $utf16);  

echo($utf8 . "\n");     // -> Mo 


Expected result:
----------------
mb:    (BOM8)Mo
iconv: Mo

(BOM8) is a placeholder

Actual result:
--------------
mb:    (BOM8)´¢ìo  (copied from cmd shell)
iconv: Mo

(BOM8) is a placeholder




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=44014&edit=1

Reply via email to