Hi Yasuo, On Mon, Jun 1, 2015 at 1:10 AM, Yasuo Ohgaki <yohg...@ohgaki.net> wrote: > > > Any invalid chars as variable/property name should be handled as invalid. > > Valid variable name: '[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*' > http://php.net/manual/en/language.variables.basics.php > > This violates JSON spec, but if user would like to allow invalid names. It > should be an option rather than the default. IMO. >
> [yohgaki@dev ~]$ php > <?php > $o = new StdClass; > $o->{123} = 11; > > var_dump($o); > ?> > > class stdClass#1 (1) { > public $123 => > int(11) > } > [yohgaki@dev ~]$ php > <?php > $o = new StdClass; > $o->123; > > var_dump($o); > ?> > > PHP Parse error: syntax error, unexpected '123' (T_LNUMBER), expecting > identifier (T_STRING) or variable (T_VARIABLE) or '{' or '$' in - on line 3 > > As you showed in your example, these names are not invalid, you just need to enclose them ( $o->{"123"} ). This is a basic PHP thing and JSON parser should not worry about users that don't know that. > > Since JSON string must be UTF-8/16/32, any invalid UTF sequence > could be treated as invalid. > > 8.1. Character Encoding > > JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default > encoding is UTF-8, and JSON texts that are encoded in UTF-8 are > interoperable in the sense that they will be read successfully by the > maximum number of implementations; there are many implementations > that cannot successfully read texts in other encodings (such as > UTF-16 and UTF-32). > > Implementations MUST NOT add a byte order mark to the beginning of a > JSON text. In the interests of interoperability, implementations > that parse JSON texts MAY ignore the presence of a byte order mark > rather than treating it as an error. > https://tools.ietf.org/html/rfc7159#section-8.1 > > I prefer BOM as invalid sequence and raising error/return NULL. > > PHP JSON parser accepts only UTF-8 and this is already correctly validated so I don't see any issue here either. > > JSON_ERROR_UTF16 would be better defined as JSON_ERROR_UTF as > JSON accepts valid UTF sequence. > The thing is that we have already JSON_ERROR_UTF8 error that is raised when input binary string is invalid. So the JSON_ERROR_UTF16 was meant to distinguish these two errors. I'm happy for other ideas but not sure about JSON_ERROR_UTF as it might be confusing with JSON_ERROR_UTF8. > It's also better to reject any invalid UTF sequence, not limited to > Unicode escaped > (\uXXXX) string. If it does not validate Unicode sequence, I would add the > validation. > The single surrogate is actually the only case when it can result in invalid unicode string. > JSON does not forbid object property begins with digits. I'm not sure how > currently handled, but it should result in error like NULL. IMO. > As noted above: see http://3v4l.org/sJo8p > Since OWASP starts advocating Unicode escape for all names and values in > JSON, I would like to have ability to encode all chars as \uXXXX by > default. > i.e. Escape all \r, \n, a, b, c, 0, 1, 2, etc as \uXXXX by default, > disable \uXXXX > encoding as an option. > I think that we are a bit late for such change as it is a bit bigger and also a BC break which would require RFC. > > BTW, any progress on disabling automatic float conversion against float > like > values? This is mandatory, IMHO. > The RFC ( https://wiki.php.net/rfc/json_numeric_as_string ) is under discussion: https://www.mail-archive.com/internals@lists.php.net/msg78683.html Cheers Jakub