Hi, On Thu, May 28, 2015 at 7:53 PM, Jakub Zelenka <bu...@php.net> wrote:
> Hi, > > There are two issues (reported bugs but not really bugs) in json_decode > related to \u escape. > > First one is > json_decode('{"\u0000": 1}'); > reported in https://bugs.php.net/bug.php?id=68546 > > That code result in fatal error due to using malformed property (private > props starting with \0). I don't think that anything parsed in json_decode > should result in a fatal error. That's why I would like to introduce a new > json error called JSON_ERROR_MANGLED_PROPERTY_NAME . > > I have just created a PR for that: https://github.com/php/php-src/pull/1332 . So if any objecting (e.g. error name), then shout now before I merge it to master... > > Second one is > json_decode('"\ud834"'); > which relusts non UTF string from JSON decoder. This is conformant to the > JSON RFC 7159 as noted in section 8.2: > > However, the ABNF in this specification allows member names and > string values to contain bit sequences that cannot encode Unicode > characters; for example, "\uDEAD" (a single unpaired UTF-16 > surrogate). Instances of this have been observed, for example, when > a library truncates a UTF-16 string without checking whether the > truncation split a surrogate pair. The behavior of software that > receives JSON texts containing such values is unpredictable; for > example, implementations might return different values for the length > of a string value or even suffer fatal runtime exceptions. > > > As the behavior is unpredictable, the current default result seems > reasonable because PHP strings are not internally unicode encode. However > there might be cases when user want to make sure that he/she gets unicode > string. In that case I would like to add an option called: > JSON_VALID_ESCAPED_UNICODE which will emit error called JSON_ERROR_UTF16 > when such escape appears. I implemented this in jsond long time ago and > think that it would be useful for the json as well. > > > I have been thinking about this a bit more and I would like to make the error by default and not introduce a new option for that. The RFC actually calls that behavior unpredictable and allows raising error so it's not against the RFC. It also makes sense because other parsers (e.g. Python 2 and 3) do the same. I can't imagine anyone relaying on \uDEAD producing invalid unicode. I think that there are much more users that actually expects valid unicode always produced by json_decode which is not the case at the moment. So it really does not make sense to keep it in PHP 7. If there are no objection, I will create a PR next week. Cheers Jakub