two comments on JSON, ECMA-404, 1st edition / October 2013

Patel-Schneider, Peter Mon, 09 Dec 2013 21:01:33 -0800

1/ According to ECMA-404, 1st edition / October 2013, a JSON text is a sequence
of Unicode code points.   The code points that can appear in a JSON text
include all code point except the control characters (the text says U+0000
to U+001F but the syntax diagram just says control character, which in
Unicode 6.3 also includes U+007F to U+009F).  Therefore, the code
point sequence <0022, DEAD, 0022> is a valid JSON text.


However, this code point sequence cannot be represented in UTF-8, UTF-16, or
UTF-32, as it is not a sequence of Unicode scalar values, and 
Unicode encoding forms are only defined on Unicode scalar values.  


2/ The unescaping of strings in JSON is ill-defined as there are quoted 
JSON strings that are the escaped version of two different sequences of
Unicode code points.  For example both <D834, DD1E> and <1D11E> can be
represented as "\uD834\uDD1E". 


Both of these appear to be bugs that should be fixed.

Peter F. Patel-Schneider

_______________________________________________
es-discuss mailing list
es-discuss@mozilla.org
https://mail.mozilla.org/listinfo/es-discuss

two comments on JSON, ECMA-404, 1st edition / October 2013

Reply via email to