On Tue, Sep 19, 2017 at 1:25 PM, Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > On 09/19/2017 06:03 PM, Peter Geoghegan wrote: >> I believe that parallelism makes the use of Bloom filter a lot more >> compelling, too. Obviously this is something that wasn't taken into >> consideration in 2015. >> > > I haven't thought about it from that point of view. Can you elaborate > why that would be the case? Sorry if this was explained earlier in this > thread (I don't see it in the history, though).
Well, IPC and locking shared state to protect the state's structure is potentially a big bottleneck for parallel hash join. I think that Bloom filters were first used in distributed databases in the 1980s, where a network round trip could be saved, which this is a little like. That's why my guess is that Bloom filtering will be more valuable when parallelism is used. I think that right deep hash joins make this really compelling, if and when they allow you to build multiple Bloom filters that can be combined from multiple right deep hash table builds. I think you can do fancy things like reduce the amount of I/O against a star schema fact table considerably. You can use one Bloom filter (built against some dimension table) to drive a bitmap index scan on a fact table index, and then another Bloom filter (built against some other dimension table) to drive another bitmap index scan. The executor then does a bitmap AND to combine the two for a bitmap heap scan on the fact table. (Maybe this technique doesn't necessarily use a Bloom filter; it could be some other type of bitmap.) -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers