Re: Adjusting hash join memory limit to handle batch explosion

Ben Mejia Wed, 24 Jun 2026 16:37:38 -0700



On 12/31/24 3:06 PM, Tomas Vondra wrote:

Hi,

I've been once again reminded of the batch explosion issue in hashjoin,

...


It's much worse when there's a batch that is not "divisible", i.e.
adding more batches does not split it roughly in half. This can happen
due to hash collisions (in the part that determines the batch),
duplicate values that didn't make it into MCV (and thus the skew
optimization does not kick in).

This is fairly rare, but when it happens it can easily lead to batch
explosion, i.e. rapidly increasing the number of batches. We add
batches, but the batch does not split, so we promptly hit the limit
again, triggering another increase. It often stops only when we exhaust
the 32-bit hash space, ending with 100s of thousands of batches.

...

* It actually helps with the "indivisible batch" case - it relaxes the
limit, so there's a chance the batch eventually fits and we stop adding
more and more batches. With spill files that's not the case - we still
keep the original limit, and we end up with the batch explosion (but
then we handle it much more efficiently).

The "indivisible batch" case is when all tuples end up in the samebatch. This is directly analogous to having all the tuples in one hashbucket. In Postgres hash joins we're simply using different bits of thetuple's hash value to select batch and bucket.

This can be addressed by the relax-the-limit approach Tomas describes inpatch (1) of <[email protected]>, committedfor PG18 as a1b4f28.


But this can only do so much. In fact there's the comment in nodeHash.c:

/*
 * If we dumped out either all or none of the tuples in the table, disable
 * further expansion of nbatch.  This situation implies that we have
 * enough tuples of identical hashvalues to overflow spaceAllowed.
 * Increasing nbatch will not fix it since there's no way to subdivide the
 * group any more finely. We have to just gut it out and hope the server
 * has enough RAM.
 */

I tried the "loop over the inner side" as did Melanie in<caakru_yswm7gc_b2nbgwfpe6wuhdolfc1lbz786duzacpud...@mail.gmail.com>.I got correct results but the runtime was worse than the gut-it-outapproach.

There is another thing to try: a secondary hash. A secondary hash splitsthe batch when the tuples have distinct keys that collide in the hash(the common case). It can't split a batch that is one repeated key:equal keys hash equally. There we still fall back to the gut-it-outpath. The point is that today we treat both cases identically, when onlythe duplicate-key case is truly unsplittable.

(See Raghu Ramakrishnan and Johannes Gehrke. Database ManagementSystems. 3rd edition. McGraw-Hill, 2003. ISBN 0-07-246563-8. Chapter 14,"Evaluation of Relational Operators". Page 465)

See <caalr2u8wufx0d+xjuf_12sgdz29ztf_pwttb2aczl9en+_4...@mail.gmail.com>for more information.


Ben Mejia

Re: Adjusting hash join memory limit to handle batch explosion

Reply via email to