Edit report at https://bugs.php.net/bug.php?id=65082&edit=1
ID: 65082
User updated by: masakielastic at gmail dot com
Reported by: masakielastic at gmail dot com
Summary: json_encode's option for replacing ill-formd byte
sequences with substitute cha
Status: Assigned
Type: Feature/Change Request
Package: JSON related
Operating System: All
PHP Version: 5.5.0
Assigned To: remi
Block user comment: N
Private report: N
New Comment:
I created new feature request for preveting XSS attack and I withdraw my option
about the change of default behavior.
new function for preventing XSS attack
https://bugs.php.net/bug.php?id=65257
Previous Comments:
------------------------------------------------------------------------
[2013-07-12 18:19:09] masakielastic at gmail dot com
I posted a patch for handling surrogate pairs
since the range (U+D800 - U+DFFF) is not allowed in UTF-8 (RFC 3629).
Someone's help is needed for handling high surrogate pairs and the options.
https://gist.github.com/masakielastic/5985383
json_decode produces invalid byte-sequences
https://bugs.php.net/bug.php?id=62010
------------------------------------------------------------------------
[2013-07-11 09:48:54] masakielastic at gmail dot com
Hi, I fixed my patch and added test case for json_decode.
------------------------------------------------------------------------
[2013-07-11 08:37:51] masakielastic at gmail dot com
Hi remi, could you test my patch for PHP_JSON_UNESCAPED_UNICODE option?
The patch adopts JSON_NOTUTF8_SUBSTITUTE and JSON_NOTUTF8_IGNORE options.
https://gist.github.com/masakielastic/5973095
------------------------------------------------------------------------
[2013-07-11 04:59:02] [email protected]
I don't think changing the current behavior is a good idea, the reason why I
really prefer some new options.
------------------------------------------------------------------------
[2013-07-11 04:27:19] masakielastic at gmail dot com
Hi, thanks nikic and remi.
After several considering, I changed my mind.
I think the behavior of substituting U+FFFD
for ill-formed sequences should be default.
How do you think?
We might need the discussion about the consitency for Escaper API.
htmlspecialchars's ENT_SUBSTITUTE option is adopted
by Symfony and Zend Framework.
https://wiki.php.net/rfc/escaper
Although the behavior breaks 2 test suites, it don't break user's codebases.
A lot of people don't use any option looking in github.
https://github.com/search?l=PHP&q=json_encode&ref=advsearch&type=Code
https://github.com/search?l=PHP&q=json_decode&ref=advsearch&type=Code
The same problem can be seen in htmlspecialchars.
https://github.com/search?l=PHP&q=htmlspecialchars&ref=advsearch&type=Code
New options complicate the situation
when using JSON_UNESCAPED_UNICODE option and json_decode.
[two option]
json_encode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_SUBSTITUTE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
If JSON_NOTUTF8_SUBSTITUTE is default behavior,
the problem we need to consider is only JSON_NOTUTF8_IGNORE option.
[one option]
json_encode
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_IGNORE
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=65082
--
Edit this bug report at https://bugs.php.net/bug.php?id=65082&edit=1