Re: hashjoins vs. Bloom filters (yet again)

Ben Mejia Tue, 23 Jun 2026 15:36:28 -0700



On 5/29/26 5:55 PM, Tomas Vondra wrote:

old patches
-----------

Those old patches tried to do a fairly small thing during a hash join,
and that's building a Bloom filter on the inner relation (the one that
gets hashed), and then use that filter before probing the hash table.

The benefits come from Bloom filters being (fairly) cheap, and a
negative answer (hash is not in the filter) may allows us to skip a much
more expensive operation.

The old threads patches focused especially at two hash join cases:

(a) A very selective join, i.e. a significant fraction of outer tuples
does not have a match in the hash table.

(b) A selective hash join forced to do batching because the hash table
is too large, and thus forced to spill outer tuples to temporary files.

For (a), the benefit comes from Bloom filters being much cheaper to
probe than a hash table. The exact cost depends on the implementation,
sizes, etc. We're in the ballpark of 50 vs. 500 cycles, maybe. But if
the filter discards 90% of tuples, it can be a big win.

For (b), the filter (for all the batches at once) allows us to discard
some of the outer tuples without writing them to temporary files. Which
is way more expensive than probing a hash table.

As it happens, I've been exploring the use of a bitmap filter for the

same two cases you mention. This has some relevance to the issues youmention in your post about sizing, false-positive rate, etc.

Instead of a Bloom filter, I chose to use a bitmap filter, with one bitper bucket on the build side. As the inner table is built, I set a bitin the bitmap filter for every occupied bucket. If a bucket is empty,there are no matching hashes and those hash values can be skipped whereappropriate. The advantages of this bitmap over a Bloom filter are:


 - sizing is pre-determined by nbuckets
 - small bitmaps (4k for 32k buckets)
 - cheaper - nominal cost to set/check bits

A well-chosen Bloom filter will be more discriminating, but the bitmaphas the same no-false-negatives guarantee and costs much less space andtime to build.


I implemented both of your cases:

Drop-before-spill: (Case b)

Build per-batch bitmaps during inner partition pass and drop tuples thatdon't have a bit set. Saves I/O on tuples that will never match. Thisonly works for inner and semi joins.



Single-Batch probe: (Case a)

Only pays off in high-miss-rate joins and a bucket array larger thanL2/L3 cache. This case has a higher penalty for hash table lookup thanthe in-cache bitmap check. This case works in multi-batch, but the I/Ocost dominates and there is no gain.

I put runtime guards on both of these; I sampled the drop rate over awindow and disable the filter for the rest of the pass if the rate fallsbelow a threshold. (~5% for case b; ~25% for case a)


The benchmarks are encouraging:

For case a, I was able to see a best-case improvement of ~15% forcarefully chosen data (dependent on L2/L3 cache size).


For case b, I tested 3 cases with sparse, average and dense probe hits:

    sparse probe (~95% miss):           +18% to +36%
    avg probe    (~37% miss):            +9%  to +13%
    dense probe  (FK-like, ~0% miss):    flat, within noise

(This was on a 8-core x86-64, L1 32KB/core, L2 4MB/core, L3 32MB, 31 GBRAM. PostgreSQL 19devel, serial hash join,max_parallel_workers_per_gather = 0, across work_mem = 1-8MB)


Happy to share the patch and full benchmark data if useful.

-Ben Mejia

Re: hashjoins vs. Bloom filters (yet again)

Reply via email to