Also, should I forbid the escape \u0000 (in all database encodings)?

Pros:

 * If \u0000 is forbidden, and the server encoding is UTF-8, then
every JSON-wrapped string will be convertible to TEXT.

 * It will be consistent with the way PostgreSQL already handles text,
and with the decision to use database-encoded JSON strings.

 * Some applications choke on strings with null characters.  For
example, in some web browsers but not others, if you pass
"Hello\u0000world" to document.write() or assign it to a DOM object's
innerHTML, it will be truncated to "Hello".  By banning \u0000, users
can catch such rogue strings early.

 * It's a little easier to represent internally.

Cons:

 * Means JSON type will accept a subset of the JSON described in
RFC4627.  However, the RFC does say "An implementation may set limits
on the length and character contents of strings", so we can arguably
get away with banning \u0000 while being law-abiding citizens.

 * Being able to store U+0000–U+00FF means users can use JSON strings
to hold binary data: by treating it as Latin-1.

- Joey

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to