Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

Tomas Vondra Thu, 17 Dec 2015 08:02:02 -0800

On 12/17/2015 11:44 AM, Simon Riggs wrote:


My understanding is that the bloom filter would be ineffective in any of
these cases
* Hash table is too small


Yes, although it depends what you mean by "too small".

Essentially if we can do with a single batch, then it's cheaper to do asingle lookup in the hash table instead of multiple lookups in the bloomfilter. The bloom filter might still win if it fits into L3 cache, butthat seems rather unlikely.

* Bloom filter too large


Too large with respect to what?

One obvious problem is that the bloom filter is built for all batches atonce, i.e. for all tuples, so it may be so big won't fit into work_mem(or takes a significant part of it). Currently it's not accounted for,but that'll need to change.

* Bloom selectivity > 50% - perhaps that can be applied dynamically,
so stop using it if it becomes ineffective

Yes. I think doing some preliminary selectivity estimation should not bedifficult - that's pretty much what calc_joinrel_size_estimate() alreadydoes.

Doing that at dynamically is also possible, but quite tricky. Imaginefor example the outer relation is sorted - in that case we may get longsequences of the same value (hash), and all of them will either have amatch in the inner relation, or not have a match. That may easily skewthe counters used for disabling the bloom filter dynamically.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: bloom filter in Hash Joins with batches

Reply via email to