On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas <robertmh...@gmail.com> wrote: > I think that's a good point.
I think that there may be something to be said for the current layout. Having adjacent keys and values could take better advantage of CPU cache characteristics. I've heard of approaches to improving B-Tree locality that forced keys and values to be adjacent on individual B-Tree pages [1], for example. I've heard of this more than once. And FWIW, I believe based on earlier research of user requirements in this area that very large jsonb datums are not considered all that compelling. Document database systems have considerable limitations here. > On the general topic, I don't think it's reasonable to imagine that > we're going to come up with a single heuristic that works well for > every kind of input data. What pglz is doing - assuming that if the > beginning of the data is incompressible then the rest probably is too > - is fundamentally reasonable, nonwithstanding the fact that it > doesn't happen to work out well for JSONB. We might be able to tinker > with that general strategy in some way that seems to fix this case and > doesn't appear to break others, but there's some risk in that, and > there's no obvious reason in my mind why PGLZ should be require to fly > blind. So I think it would be a better idea to arrange some method by > which JSONB (and perhaps other data types) can provide compression > hints to pglz. If there is to be any effort to make jsonb a more effective target for compression, I imagine that that would have to target redundancy between JSON documents. With idiomatic usage, we can expect plenty of it. [1] http://www.vldb.org/conf/1999/P7.pdf , "We also forced each key and child pointer to be adjacent to each other physically" -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers