Re: [HACKERS] [PERFORM] Hash Anti Join performance degradation

Robert Haas Tue, 31 May 2011 20:14:38 -0700

On Tue, May 31, 2011 at 9:11 PM, Cédric Villemain
<cedric.villemain.deb...@gmail.com> wrote:
> Oh damned, I am currently with an eeepc, I'll need 2 days to bench that :-D
> I'll see tomorow.


LOL.

With respect to the root of the issue (why does the anti-join take so
long?), my first thought was that perhaps the OP was very unlucky and
had a lot of values that hashed to the same bucket.  But that doesn't
appear to be the case.  There are 120 different box_id values in the
message table, and running hashint8(box_id) % 16384 (which I think is
the right calculation: 4096 buckets * 4 batches) yields 120 different
values.

I think that part of the problem here is that the planner has no
particularly efficient way of executing a non-equijoin.  Each probe
finds the appropriate hash bucket and must then probe the entire
chain, all of which pass the hash condition and some of which fail the
join qual.  So if there are n_1 instances of value v_1, n_2 instances
of value v_2, etc. then the total effort is proportional to n_1^2 +
n_2^2 + ...

But that doesn't really explain the problem because removing the last
few rows only changes that sum by a small amount.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] Hash Anti Join performance degradation

Reply via email to