ID: 47076 User updated by: lunter at interia dot pl Reported By: lunter at interia dot pl Status: Open Bug Type: Unicode Engine related Operating System: all PHP Version: 6CVS-2009-01-12 (CVS) New Comment:
Please imagine that unicode chr(946) in binary have two bytes [11001110 10110010]. Previous Comments: ------------------------------------------------------------------------ [2009-01-12 12:45:36] lunter at interia dot pl Example 4: <? print('You have to calculate sha1 of unicode chr(946)<br>'); print('regular sha1 of unicode chr(946) (\uceb2) is: 25b9b2c8a851851c7e0f1cff29a93a6aa6895f34'.'<br><br>'); $unicode=chr(946); print(sha1((binary)$unicode)); print('<br>'); print(sha1(unicode_encode($unicode,'iso-8859-1'))); print('<br>'); // print(sha1(uni2bin($unicode))); // 25b9b2c8a851851c7e0f1cff29a93a6aa6895f34 ?> ------------------------------------------------------------------------ [2009-01-12 12:40:48] lunter at interia dot pl Example 3: <? print('You have to calculate base64 of unicode chr(946)<br>'); print('regular base64 of unicode chr(946) (\uceb2) is: zrI='.'<br><br>'); $unicode=chr(946); print(base64_encode((binary)$unicode)); print('<br>'); print(base64_encode(unicode_encode($unicode,'iso-8859-1'))); print('<br>'); // print(base64_encode(uni2bin($unicode))); // zrI= ?> ------------------------------------------------------------------------ [2009-01-12 12:25:21] lunter at interia dot pl Two new functions needed: (binary) uni2bin( (string) unicode data ) (string) bin2uni( (binary) binary data ) diference beetwen unicode_(en|de)code is: convert WITHOUT using charser translation ------------------------------------------------------------------------ [2009-01-12 12:15:17] lunter at interia dot pl Description: ------------ converting binary<->string without charset translating for view binary representation of unicode or generate unicode from valid binary consists unicode sequenses note that: unicode_encode/unicode_decode using charset translating, see Reproduce code Example 1: You have (binary)$b. It consists two bytes: 11001110 10110010 Its length in binary representation is two. It is also valid one-length UTF-8 char(946) (greek small letter beta) How to conver it ($b) into one-char UTF-8 string?? When we try $u=(string)$b, it gives two-char UTF-8 string. Example 2: You have (string)$u UTF-8 one-char string. It consists chr(946) (greek small letter beta) Now You have to see two bytes binary representation of this (11001110 10110010). There is no way to convert it without charset translation... Reproduce code: --------------- <? $s=chr(946); print(strlen($s)); print('<br>'); $b=unicode_encode($s,'iso-8859-1'); print(strlen($b)); ?> Expected result: ---------------- 1 (unicode 1 char) 2 (binary 2 bytes) [11001110 10110010] Actual result: -------------- 1 1 no way to converting binary<->string without charset translating in binary we have length = 1 but it is 2 bytes ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=47076&edit=1