On Tue, Sep 16, 2014 at 4:20 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Sep 16, 2014 at 1:11 PM, Josh Berkus <j...@agliodbs.com> wrote:
> >>> Well, I can only judge from the use cases I personally have, none of
> >>> which involve more than 100 keys at any level for most rows.  So far
> >>> I've seen some people argue hypotetical use cases involving hundreds of
> >>> keys per level, but nobody who *actually* has such a use case.
> >>
> >> I already told you that I did, and that it was the only and only app I
> >> had written for JSONB.
> >
> > Ah, ok, I thought yours was a test case.  Did you check how it performed
> > on the two patches at all?  My tests with 185 keys didn't show any
> > difference, including for a "last key" case.
>
> No, I didn't test it.   But I think Heikki's test results pretty much
> tell us everything there is to see here.  This isn't really that
> complicated; I've read a few papers on index compression over the
> years and they seem to often use techniques that have the same general
> flavor as what Heikki did here, adding complexity in the data format
> to gain other advantages.  So I don't think we should be put off.
>

I second this reasoning. Even if I ran a couple of very realistic test
cases that support all-lengths I do fell that the Hybrid aproach would be
better as it covers all bases. To put things in perspective Tom's latest
patch isn't much simpler either.

Since it would still be a breaking change we should consider changing the
layout to key-key-key-value-value-value as it seems to pay off.


> Basically, I think that if we make a decision to use Tom's patch
> rather than Heikki's patch, we're deciding that the initial decision,
> by the folks who wrote the original jsonb code, to make array access
> less than O(n) was misguided.  While that could be true, I'd prefer to
> bet that those folks knew what they were doing.  The only way reason
> we're even considering changing it is that the array of lengths
> doesn't compress well, and we've got an approach that fixes that
> problem while preserving the advantages of fast lookup.  We should
> have a darn fine reason to say no to that approach, and "it didn't
> benefit my particular use case" is not it.
>
> In practice, I'm not very surprised that the impact doesn't seem too
> bad when you're running SQL queries from the client.  There's so much
> other overhead, for de-TOASTing and client communication and even just
> planner and executor costs, that this gets lost in the noise.  But
> think about a PL/pgsql procedure, say, where somebody might loop over
> all of the elements in array.  If those operations go from O(1) to
> O(n), then the loop goes from O(n) to O(n^2).  I will bet you a
> beverage of your choice that somebody will find that behavior within a
> year of release and be dismayed by it.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Reply via email to