On 06/06/2013 12:53 PM, Robert Haas wrote:
On Wed, Jun 5, 2013 at 10:46 AM, Andrew Dunstan <and...@dunslane.net> wrote:
In 9.2, the JSON parser didn't check the validity of the use of unicode
escapes other than that it required 4 hex digits to follow '\u'. In 9.3,
that is still the case. However, the JSON accessor functions and operators
also try to turn JSON strings into text in the server encoding, and this
includes de-escaping \u sequences. This works fine except when there is a
pair of sequences representing a UTF-16 type surrogate pair, something that
is explicitly permitted in the JSON spec.

The attached patch is an attempt to remedy that, and a surrogate pair is
turned into the correct code point before converting it to whatever the
server encoding is.

Note that this would mean we can still put JSON with incorrect use of
surrogates into the database, as now (9.2 and later), and they will cause
almost all the accessor functions to raise an error, as now (9.3). All this
does is allow JSON that uses surrogates correctly not to fail when applying
the accessor functions and operators. That's a possible violation of POLA,
and at least worth of a note in the docs, but I'm not sure what else we can
do now - adding this check to the input lexer would possibly cause restores
to fail, which users might not thank us for.
I think the approach you've proposed here is a good one.




I did that, but it's evident from the buildfarm that there's more work to do. The problem is that we do the de-escaping as we lex the json to construct the look ahead token, and at that stage we don't know whether or not it's really going to be needed. That means we can cause errors to be raised in far too many places. It's failing on this line:

   converted = pg_any_to_server(utf8str, utf8len, PG_UTF8);

even though the operator in use ("->") doesn't even use the de-escaped value.

The real solution is going to be to delay the de-escaping of the string until it is known to be wanted. That's unfortunately going to be a bit invasive, but I can't see a better solution. I'll work on it ASAP. Getting it to work well without a small API change might be pretty hard, though.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to