ID:               34776
 Updated by:       [EMAIL PROTECTED]
 Reported By:      narzeczony at zabuchy dot net
 Status:           Open
 Bug Type:         mbstring related
 Operating System: Linux, Windows
 PHP Version:      5.0.5
 New Comment:

I think this is correct as you are not supposed to supply a BOM if you
specify which endianness your UTF16 stream is in.


Previous Comments:
------------------------------------------------------------------------

[2005-10-07 11:52:16] narzeczony at zabuchy dot net

There is also small typo in documentation but I dont want to open
another bug.
On http://ie.php.net/mbstring this section is repeated twice:

Name in the IANA character set registry: UTF-16BE
Underlying character set: Unicode
Description: See above.
Additional note: In contrast to UTF-16, strings are always assumed to
be in big endian form. 

While one should be about UTF-16BE and other about UTF-16LE.

------------------------------------------------------------------------

[2005-10-07 11:47:13] narzeczony at zabuchy dot net

Description:
------------
When converting from UTF-16 (to ISO-8859-1 for example) BOM section (2
first bytes of UTF-16 text) should be removed, while
mb_convert_encoding function is trying to convert them.
Problem is similar to bug #22108 but maybe this one can be fixed. 

Reproduce code:
---------------
$iso_8859_1 = 'Nexor';
$utf16LE = mb_convert_encoding($iso_8859_1,'UTF-16LE','ISO-8859-1');
$utf16BE = mb_convert_encoding($iso_8859_1,'UTF-16BE','ISO-8859-1');

//lets convert both to UTF-16
//the only difference is 2 byte long BOM field added at the beggining
// \xFF\xFE for little endian
$utf16LE = "\xFF\xFE".$utf16LE;
foreach (str_split($utf16LE) as $l) {echo ord($l).' ';}
echo ' --> ';
$utf16LE2iso = mb_convert_encoding($utf16LE,'ISO-8859-1','UTF-16');
var_dump($utf16LE2iso);

echo '<br/>';

// \xFE\xFF for big endian
$utf16BE = "\xFE\xFF".$utf16BE;
foreach (str_split($utf16BE) as $l) {echo ord($l).' ';}
echo ' --> ';
$utf16BE2iso = mb_convert_encoding($utf16BE,'ISO-8859-1','UTF-16');
var_dump($utf16BE2iso);


Expected result:
----------------
255 254 78 0 101 0 120 0 111 0 114 0 --> string(5) "Nexor"
254 255 0 78 0 101 0 120 0 111 0 114 --> string(5) "Nexor"


Actual result:
--------------
255 254 78 0 101 0 120 0 111 0 114 0 --> string(6) "??exor"
254 255 0 78 0 101 0 120 0 111 0 114 --> string(6) "?Nexor"


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=34776&edit=1

Reply via email to