ID: 34776 User updated by: narzeczony at zabuchy dot net Reported By: narzeczony at zabuchy dot net Status: Open Bug Type: mbstring related Operating System: Linux, Windows PHP Version: 5.0.5 New Comment:
There is also small typo in documentation but I dont want to open another bug. On http://ie.php.net/mbstring this section is repeated twice: Name in the IANA character set registry: UTF-16BE Underlying character set: Unicode Description: See above. Additional note: In contrast to UTF-16, strings are always assumed to be in big endian form. While one should be about UTF-16BE and other about UTF-16LE. Previous Comments: ------------------------------------------------------------------------ [2005-10-07 11:47:13] narzeczony at zabuchy dot net Description: ------------ When converting from UTF-16 (to ISO-8859-1 for example) BOM section (2 first bytes of UTF-16 text) should be removed, while mb_convert_encoding function is trying to convert them. Problem is similar to bug #22108 but maybe this one can be fixed. Reproduce code: --------------- $iso_8859_1 = 'Nexor'; $utf16LE = mb_convert_encoding($iso_8859_1,'UTF-16LE','ISO-8859-1'); $utf16BE = mb_convert_encoding($iso_8859_1,'UTF-16BE','ISO-8859-1'); //lets convert both to UTF-16 //the only difference is 2 byte long BOM field added at the beggining // \xFF\xFE for little endian $utf16LE = "\xFF\xFE".$utf16LE; foreach (str_split($utf16LE) as $l) {echo ord($l).' ';} echo ' --> '; $utf16LE2iso = mb_convert_encoding($utf16LE,'ISO-8859-1','UTF-16'); var_dump($utf16LE2iso); echo '<br/>'; // \xFE\xFF for big endian $utf16BE = "\xFE\xFF".$utf16BE; foreach (str_split($utf16BE) as $l) {echo ord($l).' ';} echo ' --> '; $utf16BE2iso = mb_convert_encoding($utf16BE,'ISO-8859-1','UTF-16'); var_dump($utf16BE2iso); Expected result: ---------------- 255 254 78 0 101 0 120 0 111 0 114 0 --> string(5) "Nexor" 254 255 0 78 0 101 0 120 0 111 0 114 --> string(5) "Nexor" Actual result: -------------- 255 254 78 0 101 0 120 0 111 0 114 0 --> string(6) "??exor" 254 255 0 78 0 101 0 120 0 111 0 114 --> string(6) "?Nexor" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=34776&edit=1