Hi,

On Thu, May 28, 2015 at 7:53 PM, Jakub Zelenka <bu...@php.net> wrote:

> Hi,
>
> There are two issues (reported bugs but not really bugs) in json_decode
> related to \u escape.
>
> First one is
> json_decode('{"\u0000": 1}');
> reported in https://bugs.php.net/bug.php?id=68546
>
> That code result in fatal error due to using malformed property (private
> props starting with \0). I don't think that anything parsed in json_decode
> should result in a fatal error. That's why I would like to introduce a new
> json error called JSON_ERROR_MANGLED_PROPERTY_NAME .
>
>
I have just created a PR for that: https://github.com/php/php-src/pull/1332
. So if any objecting (e.g. error name), then shout now before I merge it
to master...


>
> Second one is
> json_decode('"\ud834"');
> which relusts non UTF string from JSON decoder. This is conformant to the
> JSON RFC 7159 as noted in section 8.2:
>
>    However, the ABNF in this specification allows member names and
>    string values to contain bit sequences that cannot encode Unicode
>    characters; for example, "\uDEAD" (a single unpaired UTF-16
>    surrogate).  Instances of this have been observed, for example, when
>    a library truncates a UTF-16 string without checking whether the
>    truncation split a surrogate pair.  The behavior of software that
>    receives JSON texts containing such values is unpredictable; for
>    example, implementations might return different values for the length
>    of a string value or even suffer fatal runtime exceptions.
>
>
> As the behavior is unpredictable, the current default result  seems
> reasonable because PHP strings are not internally unicode encode. However
> there might be cases when user want to make sure that he/she gets unicode
> string. In that case I would like to add an option called:
> JSON_VALID_ESCAPED_UNICODE which will emit error called JSON_ERROR_UTF16
> when such escape appears. I implemented this in jsond long time ago and
> think that it would be useful for the json as well.
>
>
>
I have been thinking about this a bit more and I would like to make the
error by default and not introduce a new option for that. The RFC actually
calls that behavior unpredictable and allows raising error so it's not
against the RFC. It also makes sense because other parsers (e.g. Python 2
and 3) do the same. I can't imagine anyone relaying on  \uDEAD producing
invalid unicode. I think that there are much more users that actually
expects valid unicode always produced by json_decode which is not the case
at the moment. So it really does not make sense to keep it in PHP 7. If
there are no objection, I will create a PR next week.

Cheers

Jakub

Reply via email to