Re: [HACKERS] Duplicate JSON Object Keys

Andrew Dunstan Wed, 13 Mar 2013 10:46:35 -0700


On 03/13/2013 12:51 PM, Gavin Flower wrote:

On 14/03/13 02:02, Andrew Dunstan wrote:
On 03/13/2013 08:17 AM, Robert Haas wrote:
On Fri, Mar 8, 2013 at 4:42 PM, Andrew Dunstan <[email protected]>wrote:
So my order of preference for the options would be:
1. Have the JSON type collapse objects so the last instance of akey wins
and is actually stored

2. Throw an error when a JSON type has duplicate keys

3. Have the accessors find the last instance of a key and return that
value

4. Let things remain as they are now

On second though, I don't like 4 at all. It means that the JSON type
things a value is valid while the accessor does not. Theycontradict one
another.
You can forget 1. We are not going to have the parser collapseanything.Either the JSON it gets is valid or it's not. But the parser isn'tgoing to
try to MAKE it valid.
Why not? Because it's the wrong thing to do, or because it would beslower?
What I think is tricky here is that there's more than one way to
conceptualize what the JSON data type really is.  Is it a key-value
store of sorts, or just a way to store text values that meet certain
minimalist syntactic criteria?  I had imagined it as the latter, in
which case normalization isn't sensible.  But if you think of it the
first way, then normalization is not only sensible, but almost
obligatory.  For example, we don't feel bad about this:

rhaas=# select '1e1'::numeric;
  numeric
---------
       10
(1 row)

I think Andrew and I had envisioned this as basically a text data type
that enforces some syntax checking on its input, hence the current
design.  But I'm not sure that's the ONLY sensible design.
I think we've moved on from this point, because a) otherimplementations allow duplicate keys, b) it's trivially easy to makePostgres generate such json, and c) there is some dispute aboutexactly what the spec mandates.
I'll be posting a revised patch shortly that doesn't error out butsimply uses the value for the later key lexically.
cheers

andrew
How about adding a new function with '_strict' added to the existingname, with an extra parameter 'coalesce' - or using other names, ifconsidered more appropriate!
That way slower more stringent functionality can be added whererequired. This way, the existing function need not be changed.
If coalesce = true,
then: the last duplicate is used
else: an error is returned when the new key is a duplicate.

For good or ill, we now already have a json type that will acceptstrings with duplicate keys, and generator functions which can nowgenerate such strings. If someone wants functions to enforce a strictervalidity check (e.g. via a check constraint on a domain), or to convertjson to a canonical version which strips out prior keys of the same nameand their associated values, then these should be relatively simple toimplement given the parser API in the current patch. But they aren'tpart of the current patch, and I think it's way too late to be addingsuch things. I have been persuaded by arguments made upthread that thebest thing to do is exactly what other well known json-acceptingimplementations do (e.g. V8), which is to accept json with duplicatekeys and to treat the later key/value as overriding the formerkey/value. If I'd done that from the start nobody would now be talkingabout this at all.


cheers

andrew


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Duplicate JSON Object Keys

Reply via email to