Edit report at https://bugs.php.net/bug.php?id=65082&edit=1
ID: 65082
User updated by: masakielastic at gmail dot com
Reported by: masakielastic at gmail dot com
Summary: json_encode's option for replacing ill-formd byte
sequences with substitute cha
Status: Assigned
Type: Feature/Change Request
Package: JSON related
Operating System: All
PHP Version: 5.5.0
Assigned To: remi
Block user comment: N
Private report: N
New Comment:
Hi, thanks nikic and remi.
After several considering, I changed my mind.
I think the behavior of substituting U+FFFD
for ill-formed sequences should be default.
How do you think?
We might need the discussion about the consitency for Escaper API.
htmlspecialchars's ENT_SUBSTITUTE option is adopted
by Symfony and Zend Framework.
https://wiki.php.net/rfc/escaper
Although the behavior breaks 2 test suites, it don't break user's codebases.
A lot of people don't use any option looking in github.
https://github.com/search?l=PHP&q=json_encode&ref=advsearch&type=Code
https://github.com/search?l=PHP&q=json_decode&ref=advsearch&type=Code
The same problem can be seen in htmlspecialchars.
https://github.com/search?l=PHP&q=htmlspecialchars&ref=advsearch&type=Code
New options complicate the situation
when using JSON_UNESCAPED_UNICODE option and json_decode.
[two option]
json_encode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_SUBSTITUTE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_SUBSTITUTE
JSON_NOTUTF8_IGNORE
If JSON_NOTUTF8_SUBSTITUTE is default behavior,
the problem we need to consider is only JSON_NOTUTF8_IGNORE option.
[one option]
json_encode
JSON_NOTUTF8_IGNORE
JSON_UNESCAPED_UNICODE | JSON_NOTUTF8_IGNORE
json_decode
JSON_NOTUTF8_IGNORE
Previous Comments:
------------------------------------------------------------------------
[2013-07-10 13:48:35] [email protected]
Here is a proposal fo this issue
https://github.com/remicollet/pecl-json-c/commit/5a499a4550d1f29f1f8eeb1b4ca0b01a33c64779
This add 2 new options to json_encode
- JSON_NOTUTF8_SUBSTITUTE (name seems better, at least to me), to replace
not-utf8 char with the replacement char.
- JSON_NOTUTF8_IGNORE to ignore not-utf8 char (remove in escaped mode, keep
without any check in unescaped mode)
------------------------------------------------------------------------
[2013-06-21 07:26:33] [email protected]
It's currently possible to get a partial output using
JSON_PARTIAL_OUTPUT_ON_ERROR. This will replace invalid UTF8 strings with NULL
though. It probably would make sense to have an alternative option that inserts
the substitution character.
------------------------------------------------------------------------
[2013-06-21 05:31:34] masakielastic at gmail dot com
Description:
------------
json_encode returns false if the string contains ill-formed byte
sequences. It is hard to find the problem since a lot of web applications don't
expect the existence of ill-formed byte sequences. The one example is Symfony's
JsonResponse class.
https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundat
ion/JsonResponse.php#L83
Introducing json_encode's option for replacing ill-formd byte sequences with
substitute characters (such as U+FFFD) save writing the logic.
function json_encode2($value, $options, $depth)
{
if (is_scalar($value)) {
return json_encode($value, $options, $depth);
}
$value2 = [];
foreach ($value as $key => $elm) {
$value2[str_scrub($key)] = str_scrub($elm);
}
return json_encode($value2, $options, $depth);
}
// https://bugs.php.net/bug.php?id=65081
function str_scrub($str, $encoding = 'UTF-8')
{
return htmlspecialchars_decode(htmlspecialchars($str, ENT_SUBSTITUTE,
$encoding));
}
The precedent example is htmlspecialchars's ENT_SUBSTITUTE option which was
introduced
in PHP 5.4. json_encode shares the part of logic used such as
php_next_utf8_char
by htmlspecialchars since PHP 5.5.
https://github.com/php/php-src/blob/master/ext/json/json.c#L369
Another reason for introducing the option is existence of JsonSerializable
interface.
Accessing jsonSerialize method's values come from private properties is hard
or impossbile.
The one of names of candiates for the option is JSON_SUBSTITUTE similar to
htmlspecialchar's ENT_SUBSTITUTE option.
json_encode($object, JSON_SUBSTITUTE);
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=65082&edit=1