On 03/16/2014 04:10 AM, Peter Geoghegan wrote:
On Thu, Mar 13, 2014 at 2:00 PM, Andrew Dunstan <and...@dunslane.net> wrote:
I'll be travelling a good bit of tomorrow (Friday), but I hope Peter has
finished by the time I am back on deck late tomorrow and that I am able to
commit this on Saturday.
I asked Andrew to hold off on committing this today. It was agreed
that we weren't quite ready, because there were one or two remaining
bugs (since fixed), but also because I felt that it would be useful to
first hear the opinions of more people before proceeding. I think that
we're not that far from having something committed. Obviously I hope
to get this into 9.4, and attach a lot of strategic importance to
having the feature, which is why I made a large effort to help land
it.

Attached patch has a number of notable revisions. Throughout, it has
been possible for anyone to follow our progress here:
https://github.com/feodor/postgres/commits/jsonb_and_hstore

* In general, the file jsonb_support.c (renamed to jsonb_utils.c) is
vastly better commented, and has a much clearer structure. This was
not something I did much with in the previous revision, and so it has
been a definite focus of this one.

* Hashing is refactored to not use CRC32 anymore. I felt this was a
questionable method of hashing, both within jsonb_hash(), as well as
the jsonb_hash_ops GIN operator class.

* Dead code elimination.

* I got around to fixing the memory leaks in B-Tree support function one.

* Andrew added hstore_to_jsonb, hstore_to_jsonb_loose functions and a
cast. One goal of this effort is to preserve a parallel set of
facilities for the json and jsonb types, and that includes
hstore-related features.

* A fix from Alexander for the jsonb_hash_ops @>operator issue I
complained about during the last submission was merged.

* There is no longer any GiST opclass. That just leaves B-Tree, hash,
GIN (default) and GIN jsonb_hash_ops opclasses.

My outstanding concerns are:

* Have we got things right with GIN indexing, containment semantics,
etc? See my remarks in the patch, by grepping "contain" within
jsonb_util.c. Is the GIN text storage serialization format appropriate
and correct?

* General design concerns. By far the largest source of these is the
file jsonb_util.c.

* Is the on-disk format that we propose to tie Postgres to as good as
it could be?




I've been working through all the changes and fixes that Peter and others have made, and they look pretty good to me. There are a few mostly cosmetic changes I want to make, but nothing that would be worth holding up committing this for. I'm fairly keen to get this committed, get some buildfarm coverage and get more people playing with it and testing it.

Like Peter, I would like to see more comments from people on the GIN support, especially.

The one outstanding significant question of substance I have is this: given the commit 5 days ago of provision for triConsistent functions for GIN opclasses, should be be adding these to the two GIN opclasses we are providing, and what should they look like? Again, this isn't an issue that I think needs to hold up committing what we have now.

Regarding Peter's last question, if we're not satisfied with the on-disk format proposed it would mean throwing the whole effort out and starting again. The only thing I have thought of as an alternative would be to store the structure and values separately rather than with values inline with the structure. That way you could have a hash of values more or less, which would eliminate redundancy of storage of things like object field names. But such a structure might well involve at least as much computational overhead as the current structure. And nobody's been saying all along "hold on, we can do better than this." So I'm pretty inclined to go with what we have.

cheers

andrew






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to