ID:          41067
 Updated by:  [EMAIL PROTECTED]
 Reported By: jp at df5ea dot net
-Status:      Open
+Status:      Feedback
 Bug Type:    *Unicode Issues
 PHP Version: 5CVS-2007-04-12 (CVS)
 New Comment:

Can you post a link to the patch?


Previous Comments:
------------------------------------------------------------------------

[2007-04-12 18:12:28] jp at df5ea dot net

Description:
------------
When decoding a string with surrogate pairs in it, JSON_decode()
produces incorrect UTF-8. Instead of encoding the two surrogate
characters as one UTF-8 sequence it encodes it as two sequences wich
represent the two surrogate code points.

The decoded string is actually CESU-8. The JSON_encode() function can
not encode such a string.

I have a patch to JSON_parse.c that transcodes the UTF-16 properly to
UTF-8.

Reproduce code:
---------------
<?php
$single_barline = "\360\235\204\200";
$array = array($single_barline);
print bin2hex($single_barline) . "\n";
// print $single_barline . "\n\n";
$json = json_encode($array);
print $json . "\n\n";
$json_decoded = json_decode($json, true);
// print $json_decoded[0] . "\n";
print bin2hex($json_decoded[0]) . "\n";
print "END\n";
?>


Expected result:
----------------
The output form the two bin2hex functions should be the same:

f09d8480

["\ud834\udd00"]

f09d8480
END


Actual result:
--------------
The second string is different from the input string and illegal
UTF-8.

f09d8480

["\ud834\udd00"]

eda0b4edb480
END



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=41067&edit=1

Reply via email to