Re: hashjoins vs. Bloom filters (yet again)

Tomas Vondra Wed, 03 Jun 2026 03:08:09 -0700

On 6/3/26 11:20, Oleg Bartunov wrote:
> ...
>
> Bloom filters have two rather different roles here.
> 
> For a local Hash Join optimization, Bloom does not require any
> particular physical ordering of the heap. It can be useful simply when
> the join is selective enough, or when batching/spilling makes failed
> probes expensive: the Bloom filter rejects many outer tuples before a
> full hash-table probe or before writing them to temporary batches.
>


Right. Adding a filter within a hash join is certainly less ambitious,
and the possible benefits are smaller.

> But once we talk about pushing a runtime filter down to the scan/storage
> layer, the physical preconditions become crucial. To get more than a
> cheap per-row check, the scan must have something coarse-grained to
> skip: partitions, row groups, chunks, block ranges, dictionaries, min/
> max metadata, BRIN-like summaries, etc. Without that, the filter is
> still correct, but the benefit is mostly CPU/probe reduction rather than
> avoiding data production.
> 

Maybe, but there's also ongoing work on adding batches to the executor,
in which case we'd eliminate "row groups" even when using a filter in
the scope of a hashjoin operator. Of course, the tuples will flow all
the way up to that operator.

> So for me the most interesting part of this thread is not Bloom itself,
> but the architectural idea: pushing runtime knowledge down to the scan
> node, against the normal direction of data flow. The build side of a
> join produces compact knowledge about admissible keys, and lower layers
> may use it before rows are materialized and sent upward.
> 
> I saw this in my own experiments with zone/chunk-oriented storage for
> Postgres: static predicates could prune zones nicely, but joins were the
> hard case because the useful filtering knowledge was produced above the
> scan. A runtime semi-join filter pushed from the Hash Join build side
> into the scan could turn join-derived knowledge into scan-level pruning.
> 
> For example:
> 
>   SELECT sum(e.cost)
>   FROM events e
>   JOIN accounts a ON e.account_id = a.id <http://a.id>
>   WHERE a.region = 'NP'; -- Nepal
> 
> The events scan does not know which account_id values are EU accounts.
> That knowledge is produced above it, on the build side of the join. A
> runtime semi-join filter pushed from the Hash Join build side down into
> the events scan could let the scan reject impossible account_id values
> before producing tuples.
> 

Yes. This is known as "predicate transfer" in academic papers.

> For a plain heap scan this may mostly save hash probes. But with zone/
> chunk-oriented storage, where chunks have dictionaries, min/max
> metadata, Bloom summaries, or tenant ranges, the same runtime filter can
> skip whole chunks. That is the part I find most interesting: turning
> join-derived knowledge into scan-level pruning, against the normal
> direction of data flow.
> 
> Bloom is just one carrier for that knowledge. The real feature is a
> pluggable runtime-filter mechanism that heap, CustomScan, FDW, columnar/
> table AMs, partitioned storage, or chunk/cold storage can consume at the
> level they understand.
> 
> This may be a topic for a separate thread, because it quickly becomes
> less about Hash Join Bloom filters and more about runtime knowledge
> pushdown into storage.
> 

Right, there's a general concept of a "filter", and Bloom filters are
just one example of that. And maybe we could build other types of
filters more suitable for the scan. But I think it'll still be tied to a
hash join, because what other nodes / joins can build the filter?


regards

-- 
Tomas Vondra

Re: hashjoins vs. Bloom filters (yet again)

Reply via email to