Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

Peter Geoghegan Wed, 20 Sep 2017 14:17:55 -0700

On Tue, Sep 19, 2017 at 1:25 PM, Tomas Vondra
<[email protected]> wrote:
> On 09/19/2017 06:03 PM, Peter Geoghegan wrote:
>> I believe that parallelism makes the use of Bloom filter a lot more
>> compelling, too. Obviously this is something that wasn't taken into
>> consideration in 2015.
>>
>
> I haven't thought about it from that point of view. Can you elaborate
> why that would be the case? Sorry if this was explained earlier in this
> thread (I don't see it in the history, though).


Well, IPC and locking shared state to protect the state's structure is
potentially a big bottleneck for parallel hash join. I think that
Bloom filters were first used in distributed databases in the 1980s,
where a network round trip could be saved, which this is a little
like. That's why my guess is that Bloom filtering will be more
valuable when parallelism is used.

I think that right deep hash joins make this really compelling, if and
when they allow you to build multiple Bloom filters that can be
combined from multiple right deep hash table builds. I think you can
do fancy things like reduce the amount of I/O against a star schema
fact table considerably. You can use one Bloom filter (built against
some dimension table) to drive a bitmap index scan on a fact table
index, and then another Bloom filter (built against some other
dimension table) to drive another bitmap index scan. The executor then
does a bitmap AND to combine the two for a bitmap heap scan on the
fact table.

(Maybe this technique doesn't necessarily use a Bloom filter; it could
be some other type of bitmap.)

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Boom filters for hash joins (was: A design for amcheck heapam verification)

Reply via email to