Andrew Dunstan <and...@dunslane.net> writes: > On 08/07/2014 11:17 PM, Tom Lane wrote: >> I looked into the issue reported in bug #11109. The problem appears to be >> that jsonb's on-disk format is designed in such a way that the leading >> portion of any JSON array or object will be fairly incompressible, because >> it consists mostly of a strictly-increasing series of integer offsets.
> Ouch. > Back when this structure was first presented at pgCon 2013, I wondered > if we shouldn't extract the strings into a dictionary, because of key > repetition, and convinced myself that this shouldn't be necessary > because in significant cases TOAST would take care of it. That's not really the issue here, I think. The problem is that a relatively minor aspect of the representation, namely the choice to store a series of offsets rather than a series of lengths, produces nonrepetitive data even when the original input is repetitive. > Maybe we should have pglz_compress() look at the *last* 1024 bytes if it > can't find anything worth compressing in the first, for values larger > than a certain size. Not possible with anything like the current implementation, since it's just an on-the-fly status check not a trial compression. > It's worth noting that this is a fairly pathological case. AIUI the > example you constructed has an array with 100k string elements. I don't > think that's typical. So I suspect that unless I've misunderstood the > statement of the problem we're going to find that almost all the jsonb > we will be storing is still compressible. Actually, the 100K-string example I constructed *did* compress. Larry's example that's not compressing is only about 12kB. AFAICS, the threshold for trouble is in the vicinity of 256 array or object entries (resulting in a 1kB JEntry array). That doesn't seem especially high. There is a probabilistic component as to whether the early-exit case will actually fire, since any chance hash collision will probably result in some 3-byte offset prefix getting compressed. But the fact that a beta tester tripped over this doesn't leave me with a warm feeling about the odds that it won't happen much in the field. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers