ID:               47076
 User updated by:  lunter at interia dot pl
 Reported By:      lunter at interia dot pl
 Status:           Open
 Bug Type:         Unicode Engine related
 Operating System: all
 PHP Version:      6CVS-2009-01-12 (CVS)
 New Comment:

Please imagine that unicode chr(946) in binary have two bytes [11001110
10110010].


Previous Comments:
------------------------------------------------------------------------

[2009-01-12 12:45:36] lunter at interia dot pl

Example 4:

<?
 print('You have to calculate sha1 of unicode chr(946)<br>');
 print('regular sha1 of unicode chr(946) (\uceb2) is:
25b9b2c8a851851c7e0f1cff29a93a6aa6895f34'.'<br><br>');

 $unicode=chr(946);

 print(sha1((binary)$unicode));
 print('<br>');
 print(sha1(unicode_encode($unicode,'iso-8859-1')));
 print('<br>');

// print(sha1(uni2bin($unicode))); //
25b9b2c8a851851c7e0f1cff29a93a6aa6895f34
?>

------------------------------------------------------------------------

[2009-01-12 12:40:48] lunter at interia dot pl

Example 3:

<?
 print('You have to calculate base64 of unicode chr(946)<br>');
 print('regular base64 of unicode chr(946) (\uceb2) is:
zrI='.'<br><br>');

 $unicode=chr(946);

 print(base64_encode((binary)$unicode));
 print('<br>');
 print(base64_encode(unicode_encode($unicode,'iso-8859-1')));
 print('<br>');

// print(base64_encode(uni2bin($unicode))); // zrI=
?>

------------------------------------------------------------------------

[2009-01-12 12:25:21] lunter at interia dot pl

Two new functions needed:

(binary) uni2bin( (string) unicode data )
(string) bin2uni( (binary) binary data )


diference beetwen unicode_(en|de)code is: convert WITHOUT using charser
translation

------------------------------------------------------------------------

[2009-01-12 12:15:17] lunter at interia dot pl

Description:
------------
converting binary<->string without charset translating for view binary
representation of unicode or generate unicode from valid binary consists
unicode sequenses

note that: unicode_encode/unicode_decode using charset translating, see
Reproduce code

Example 1:

You have (binary)$b. It consists two bytes: 11001110 10110010
Its length in binary representation is two.
It is also valid one-length UTF-8 char(946) (greek small letter beta)
How to conver it ($b) into one-char UTF-8 string??
When we try $u=(string)$b, it gives two-char UTF-8 string.

Example 2:

You have (string)$u UTF-8 one-char string. It consists chr(946) (greek
small letter beta)
Now You have to see two bytes binary representation of this (11001110
10110010).
There is no way to convert it without charset translation...

Reproduce code:
---------------
<?
 $s=chr(946);
 print(strlen($s));

 print('<br>');

 $b=unicode_encode($s,'iso-8859-1');

 print(strlen($b));
?>

Expected result:
----------------
1 (unicode 1 char)
2 (binary 2 bytes) [11001110 10110010]

Actual result:
--------------
1
1


no way to converting binary<->string without charset translating
in binary we have length = 1 but it is 2 bytes


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=47076&edit=1

Reply via email to