Re: [HACKERS] JSON for PG 9.2

Andrew Dunstan Sat, 14 Jan 2012 16:14:54 -0800


On 01/14/2012 06:11 PM, Joey Adams wrote:

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan<[email protected]>  wrote:

Second, what should be do when the database encoding isn't UTF8? I'm
inclined to emit a \unnnn escape for any non-ASCII character (assuming it
has a unicode code point - are there any code points in the non-unicode
encodings that don't have unicode equivalents?). The alternative would be to
fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
to deal with JSON should be using UTF8 anyway, but we still have to deal
with these things. What about SQL_ASCII? If there's a non-ASCII sequence
there we really have no way of telling what it should be. There at least I
think we should probably error out.

I don't think there is a satisfying solution to this problem.  Things
working against us:

  * Some server encodings support characters that don't map to Unicode
characters (e.g. unused slots in Windows-1252).  Thus, converting to
UTF-8 and back is lossy in general.

  * We want a normalized representation for comparison.  This will
involve a mixture of server and Unicode characters, unless the
encoding is UTF-8.

  * We can't efficiently convert individual characters to and from
Unicode with the current API.

  * What do we do about \u0000 ?  TEXT datums cannot contain NUL characters.

I'd say just ban Unicode escapes and non-ASCII characters unless the
server encoding is UTF-8, and ban all \u0000 escapes.  It's easy, and
whatever we support later will be a superset of this.

Strategies for handling this situation have been discussed in prior
emails.  This is where things got stuck last time.

Well, from where I'm coming from, nuls are not a problem. Butescape_json() is currently totally encoding-unaware. It produces \unnnnescapes for low ascii characters, and just passes through characterswith the high bit set. That's possibly OK for EXPLAIN output - we reallydon't want don't want EXPLAIN failing. But maybe we should ban JSONoutput for EXPLAIN if the encoding isn't UTF8.

Another question in my mind is what to do when the client encoding isn'tUTF8.

None of these is an insurmountable problem, ISTM - we just need to makesome decisions.


cheers

andrew

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

Reply via email to