Josh Berkus <j...@agliodbs.com> writes: > On 08/14/2014 07:24 PM, Tom Lane wrote: >> We can certainly reduce that. The question was whether it would be >> worth the effort to try. At this point, with three different test >> data sets having shown clear space savings, I think it is worth >> the effort. I'll poke into it tomorrow or over the weekend, unless >> somebody beats me to it.
> Note that I specifically created that data set to be a worst case: many > top-level keys, no nesting, and small values. However, I don't think > it's an unrealistic worst case. > Interestingly, even on the unpatched, 1GB table case, the *index* on the > JSONB is only 60MB. Which shows just how terrific the improvement in > GIN index size/performance is. I've been poking at this, and I think the main explanation for your result is that with more JSONB documents being subject to compression, we're spending more time in pglz_decompress. There's no free lunch in that department: if you want compressed storage it's gonna cost ya to decompress. The only way I can get decompression and TOAST access to not dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE PLAIN. However, when I do that, I do see my test patch running about 25% slower overall than HEAD on an "explain analyze select jfield -> 'key' from table" type of query with 200-key documents with narrow fields (see attached perl script that generates the test data). It seems difficult to improve much on that for this test case. I put some logic into findJsonbValueFromContainer to calculate the offset sums just once not once per binary-search iteration, but that only improved matters 5% at best. I still think it'd be worth modifying the JsonbIterator code to avoid repetitive offset calculations, but that's not too relevant to this test case. Having said all that, I think this test is something of a contrived worst case. More realistic cases are likely to have many fewer keys (so that speed of the binary search loop is less of an issue) or else to have total document sizes large enough that inline PLAIN storage isn't an option, meaning that detoast+decompression costs will dominate. regards, tom lane
#! /usr/bin/perl for (my $i = 0; $i < 100000; $i++) { print "{"; for (my $k = 1; $k <= 200; $k++) { print ", " if $k > 1; printf "\"k%d\": %d", $k, int(rand(10)); } print "}\n"; }
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers