Hi,

There are two issues (reported bugs but not really bugs) in json_decode
related to \u escape.

First one is
json_decode('{"\u0000": 1}');
reported in https://bugs.php.net/bug.php?id=68546

That code result in fatal error due to using malformed property (private
props starting with \0). I don't think that anything parsed in json_decode
should result in a fatal error. That's why I would like to introduce a new
json error called JSON_ERROR_MANGLED_PROPERTY_NAME .


Second one is
json_decode('"\ud834"');
which relusts non UTF string from JSON decoder. This is conformant to the
JSON RFC 7159 as noted in section 8.2:

   However, the ABNF in this specification allows member names and
   string values to contain bit sequences that cannot encode Unicode
   characters; for example, "\uDEAD" (a single unpaired UTF-16
   surrogate).  Instances of this have been observed, for example, when
   a library truncates a UTF-16 string without checking whether the
   truncation split a surrogate pair.  The behavior of software that
   receives JSON texts containing such values is unpredictable; for
   example, implementations might return different values for the length
   of a string value or even suffer fatal runtime exceptions.


As the behavior is unpredictable, the current default result  seems
reasonable because PHP strings are not internally unicode encode. However
there might be cases when user want to make sure that he/she gets unicode
string. In that case I would like to add an option called:
JSON_VALID_ESCAPED_UNICODE which will emit error called JSON_ERROR_UTF16
when such escape appears. I implemented this in jsond long time ago and
think that it would be useful for the json as well.

Thoughts?

I'm happy with changing constant names if someone come up with a better
names.

I would like to patch master sometimes next week if they are no objections.

Cheers

Jakub

Reply via email to