On 04/09/2014 01:18 AM, Andrew Dunstan wrote:

On 04/08/2014 05:57 PM, Peter Geoghegan wrote:
On Tue, Apr 8, 2014 at 2:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
Well, let me see if I understand the situation correctly:

* jsonb_ops supports more operators

* jsonb_hash_ops produces smaller, better-performing indexes

* jsonb_ops falls over on inputs with wide field values, but
jsonb_hash_ops does not
There might be some compelling cases for indexing existence rather
than containment, since the recheck flag isn't set there, but in
general this summary seems sound. I would say that broadly, existence
is a less useful operator than containment, and so jsonb_hash_ops is
broadly more compelling. I didn't propose changing the default due to
concerns about the POLA, but I'm happy to be told that those concerns
were out of proportion to the practical benefits of a different
default.

I tend to agree with Tom that POLA will be more violated by the default
ops class not being able to index some values.

Yeah.

<rant>

Both of the operator classes are actually much less flexible than I'd like. Firstly, they index everything. In many cases, that's not what you want, so you end up with much larger indexes than necessary. Secondly, jsonb_ops indexes all values separately from the keys. That makes the index pretty much useless for a query on, say, WHERE json @> '{"needs_processing":true}', if all the rows also contain a key-value pair "active":true. Thirdly, inequality operators are not supported; you can't search for rows with (the json-syntax equivalent of) "price < 12.3". Fourthly, sometimes you would want to include the "path" to an entry in the key, sometimes not.

If I understood correctly the way jsonb_hash_ops works, the limitation compared to jsonb_ops is that it cannot be used for foo ? 'bar' type queries. And the reason for that limitation is that it hashes the whole path to the key; the naked values are not indexes separately. But why not? jsonb_ops does - why is that decision related to whether you hash or not? Or it could index both. Sure, it would be wasteful when you don't need to support foo ? 'bar', but the point is that it should be up to the DBA to decide, based on his needs.

As the code stands, you don't have a choice on any of those things. The decisions have been made by us, PostgreSQL developers. The only choice you have is between jsonb_ops and jsonb_hash_ops, with a strange combination of tradeoffs in both. Sure, they're still useful, if not optimal, for a wide-range of applications. For more complicated cases, you will have to resort to expression indexes. It bugs me greatly that the underlying indexam could do all those things, we're just not exposing the capability.

ISTM we need a way to parameterize opclasses, so that when you create the index, you specify the above things.

</rant>

The ship has cleatly sailed to add parameterized opclasses to 9.4, but let's keep it in mind when we decide on the defaults.

In the absence of parameterizable opclasses, it would be much more flexible to have opclasses that index, keys, values, key-value pairs and paths separately, instead of the current json_ops and json_hash_ops opclasses which index all of those in the same index. That way, if you only e.g. ever query on the existence of a key, you'd only need to index the keys.

I don't understand how we ended up with the current dichotomy of json_ops and json_hash_ops...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to