On Tue, Sep 16, 2014 at 4:20 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus <j...@agliodbs.com> wrote: > >>> Well, I can only judge from the use cases I personally have, none of > >>> which involve more than 100 keys at any level for most rows. So far > >>> I've seen some people argue hypotetical use cases involving hundreds of > >>> keys per level, but nobody who *actually* has such a use case. > >> > >> I already told you that I did, and that it was the only and only app I > >> had written for JSONB. > > > > Ah, ok, I thought yours was a test case. Did you check how it performed > > on the two patches at all? My tests with 185 keys didn't show any > > difference, including for a "last key" case. > > No, I didn't test it. But I think Heikki's test results pretty much > tell us everything there is to see here. This isn't really that > complicated; I've read a few papers on index compression over the > years and they seem to often use techniques that have the same general > flavor as what Heikki did here, adding complexity in the data format > to gain other advantages. So I don't think we should be put off. > I second this reasoning. Even if I ran a couple of very realistic test cases that support all-lengths I do fell that the Hybrid aproach would be better as it covers all bases. To put things in perspective Tom's latest patch isn't much simpler either. Since it would still be a breaking change we should consider changing the layout to key-key-key-value-value-value as it seems to pay off. > Basically, I think that if we make a decision to use Tom's patch > rather than Heikki's patch, we're deciding that the initial decision, > by the folks who wrote the original jsonb code, to make array access > less than O(n) was misguided. While that could be true, I'd prefer to > bet that those folks knew what they were doing. The only way reason > we're even considering changing it is that the array of lengths > doesn't compress well, and we've got an approach that fixes that > problem while preserving the advantages of fast lookup. We should > have a darn fine reason to say no to that approach, and "it didn't > benefit my particular use case" is not it. > > In practice, I'm not very surprised that the impact doesn't seem too > bad when you're running SQL queries from the client. There's so much > other overhead, for de-TOASTing and client communication and even just > planner and executor costs, that this gets lost in the noise. But > think about a PL/pgsql procedure, say, where somebody might loop over > all of the elements in array. If those operations go from O(1) to > O(n), then the loop goes from O(n) to O(n^2). I will bet you a > beverage of your choice that somebody will find that behavior within a > year of release and be dismayed by it. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >