On Tue, Sep 9, 2014 at 2:00 PM, Robert Haas <robertmh...@gmail.com> wrote: > Boiled down, what you're saying is that you might have a set that > contains lots of duplicates in general, but not very many where the > abbreviated-keys also match. Sure, that's true.
Abbreviated keys are not used in the case where we do a (fully) opportunistic memcmp(), without really having any idea of whether or not it'll work out. Abbreviated keys aren't really relevant to that case, except perhaps in that we know we'll have statistics available for leading attributes, which will make the case less frequent in practice. > But you might also > not have that case, so I don't see where that gets us; the same > worst-case test case Heikki developed the last time we relitigated > this point is still relevant here. Well, I think that there should definitely be a distinction made between abbreviated and non-abbreviated cases; you could frequently have almost 100% certainty that each of those optimistic memcmp()s will work out with abbreviated keys. Low cardinality sets are very common. I'm not sure what your position on that is. My proposal to treat both of those cases (abbreviated with a cost model/cardinality statistics; non-abbreviated without) the same is based on different arguments for each case. > In order to know how much we're > giving up in that case, we need the exact number I asked you to > provide in my previous email: the ratio of the cost of strcoll() to > the cost of memcmp(). > > I see that you haven't chosen to provide that information in any of > your four responses. Well, it's kind of difficult to give that number in a vacuum. I showed a case that had a large majority of opportunistic memcmp()s go to waste, while a small number were useful, which still put us ahead. I can look at Heikki's case with this again if you think that'll help. Heikki said that his case was all about wasted strxfrm()s, which are surely much more expensive than wasted memcmp()s, particularly when you consider temporal locality (we needed that memory to be stored in a cacheline for the immediately subsequent operation anyway, should the memcmp() thing not work out - the simple ratio that you're interested in may be elusive). In case I haven't been clear enough on this point, I re-emphasize that I do accept that for something like the non-abbreviated case, the opportunistic memcmp() thing must virtually be free if we're to proceed, since it is purely opportunistic. If it can be demonstrated that that isn't the case (and if that cannot be fixed by limiting it to < CACHE_LINE_SIZE), clearly we should not proceed with opportunistic (non-abbreviated) memcmp()s. In fact, I think I'm holding it to a higher standard than you are - I believe that it had better be virtually free. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers