Re: [PHP-DEV] JSON unicode escape issue and new constants

Jakub Zelenka Mon, 01 Jun 2015 11:09:04 -0700

Hi Yasuo,

On Mon, Jun 1, 2015 at 1:10 AM, Yasuo Ohgaki <yohg...@ohgaki.net> wrote:
>
>
> Any invalid chars as variable/property name should be handled as invalid.
>
> Valid variable name:  '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*'
> http://php.net/manual/en/language.variables.basics.php
>
> This violates JSON spec, but if user would like to allow invalid names. It
> should be an option rather than the default. IMO.
>


> [yohgaki@dev ~]$ php
> <?php
> $o = new StdClass;
> $o->{123} = 11;
>
> var_dump($o);
> ?>
>
> class stdClass#1 (1) {
>   public $123 =>
>   int(11)
> }
> [yohgaki@dev ~]$ php
> <?php
> $o = new StdClass;
> $o->123;
>
> var_dump($o);
> ?>
>
> PHP Parse error:  syntax error, unexpected '123' (T_LNUMBER), expecting
> identifier (T_STRING) or variable (T_VARIABLE) or '{' or '$' in - on line 3
>
>
As you showed in your example, these names are not invalid, you just need
to enclose them ( $o->{"123"} ). This is a basic PHP thing and JSON parser
should not worry about users that don't know that.


>
> Since JSON string must be UTF-8/16/32, any invalid UTF sequence
> could be treated as invalid.
>
> 8.1.  Character Encoding
>
>    JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
>    encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
>    interoperable in the sense that they will be read successfully by the
>    maximum number of implementations; there are many implementations
>    that cannot successfully read texts in other encodings (such as
>    UTF-16 and UTF-32).
>
>    Implementations MUST NOT add a byte order mark to the beginning of a
>    JSON text.  In the interests of interoperability, implementations
>    that parse JSON texts MAY ignore the presence of a byte order mark
>    rather than treating it as an error.
> https://tools.ietf.org/html/rfc7159#section-8.1
>
> I prefer BOM as invalid sequence and raising error/return NULL.
>
>
PHP JSON parser accepts only UTF-8 and this is already correctly validated
so I don't see any issue here either.

>
> JSON_ERROR_UTF16 would be better defined as JSON_ERROR_UTF as
> JSON accepts valid UTF sequence.
>

The thing is that we have already JSON_ERROR_UTF8 error that is raised when
input binary string is invalid. So the JSON_ERROR_UTF16 was meant to
distinguish these two errors. I'm happy for other ideas but not sure about
JSON_ERROR_UTF as it might be confusing with JSON_ERROR_UTF8.


> It's also better to reject any invalid UTF sequence, not limited to
> Unicode escaped
> (\uXXXX) string. If it does not validate Unicode sequence, I would add the
> validation.
>

The single surrogate  is actually the only case when it can result in
invalid unicode string.


> JSON does not forbid object property begins with digits. I'm not sure how
> currently handled, but it should result in error like NULL. IMO.
>

As noted above: see http://3v4l.org/sJo8p


> Since OWASP starts advocating Unicode escape for all names and values in
> JSON, I would like to have ability to encode all chars as \uXXXX by
> default.
> i.e. Escape all \r, \n, a, b, c, 0, 1, 2, etc as \uXXXX by default,
> disable \uXXXX
> encoding as an option.
>

I think that we are a bit late for such change as it is a bit bigger and
also a BC break which would require RFC.

>
> BTW, any progress on disabling automatic float conversion against float
> like
> values? This is mandatory, IMHO.
>

The RFC ( https://wiki.php.net/rfc/json_numeric_as_string ) is under
discussion:
https://www.mail-archive.com/internals@lists.php.net/msg78683.html

Cheers

Jakub

Re: [PHP-DEV] JSON unicode escape issue and new constants

Reply via email to