DSA overflow in hash join

Konstantin Knizhnik Sun, 20 Jul 2025 04:48:41 -0700

Hi hackers!

There is weird error rarely reproduced with sqlancer: `ERROR: invalidDSA memory alloc request size 1140850688`: **


*-------------------------------------------------------------------------------------------------------------------------*

*

FinalizeAggregate(cost=114075075706156875776.00..114075075706156875776.00rows=1width=8)

->Gather(cost=114075075706156875776.00..114075075706156875776.00rows=4width=8)

WorkersPlanned:4

->PartialAggregate(cost=114075075706156875776.00..114075075706156875776.00rows=1width=8)

->ParallelHashJoin(cost=6510542055.80..97778619693677723648.00rows=6518582404991658491904width=0)

HashCond:((t2_1.c1*t2_1.c1)=t4_1.c1)

->NestedLoop(cost=0.00..57152791114.20rows=2539498720000width=32)

->ParallelSeqScanont2t2_1(cost=0.00..12.20rows=220width=32)

->NestedLoop(cost=0.00..144353654.10rows=11543176000width=0)

->NestedLoop(cost=0.00..63915.85rows=5107600width=0)

->SeqScanont5(cost=0.00..32.60rows=2260width=0)

->Materialize(cost=0.00..43.90rows=2260width=0)

->SeqScanont0(cost=0.00..32.60rows=2260width=0)

->Materialize(cost=0.00..43.90rows=2260width=0)

->SeqScanont0t0_1(cost=0.00..32.60rows=2260width=0)

->ParallelHash(cost=4028895759.80..4028895759.80rows=128343727600width=32)

->NestedLoop(cost=0.00..4028895759.80rows=128343727600width=32)

->NestedLoop(cost=0.00..3569757.80rows=145845145width=0)

->NestedLoopLeftJoin(cost=0.00..7536.20rows=64533width=0)

JoinFilter:(upper((t2.c1+t2.c1)))::boolean

->ParallelSeqScanont2(cost=0.00..12.20rows=220width=32)

->SeqScanont4(cost=0.00..18.80rows=880width=0)

->SeqScanont5t5_1(cost=0.00..32.60rows=2260width=0)

->SeqScanont4t4_1(cost=0.00..18.80rows=880width=32)



The problem is reproduced only with *max_parallel_workers_per_gather=4

The error is produced by `dsa_allocate*` most likely in nodeHash.c(fortunately there are not so much calls of dsa_allocate)Certainly there are checks which should prevent allocation of more thanMaxAllocSize:


    max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));

But looks like not everywhere.
Particularly I suspect this places

nodeHash.c:1668:

                /* Double the size of the bucket array. */
                pstate->nbuckets *= 2;
                size = pstate->nbuckets * sizeof(dsa_pointer_atomic);
                hashtable->batches[0].shared->size += size / 2;

dsa_free(hashtable->area,hashtable->batches[0].shared->buckets);

                hashtable->batches[0].shared->buckets =
                    dsa_allocate(hashtable->area, size);

nodeHash.c:3290:

    batch->buckets =

dsa_allocate(hashtable->area, sizeof(dsa_pointer_atomic) *nbuckets);

In first case number of buckets is doubled, in the second case it may bedoubled in ExecChooseHashTableSize:


        /*
         * It's better to use half the batches, so do that and adjust the
         * nbucket in the opposite direction, and double the allowance.
         */
        nbatch /= 2;
        nbuckets *= 2;

May be I missing something, but do we have any protection here fromoverflow?

One more notice: as you can see estimation of number of rows in thiscase is 6518582404991658491904 (doesn't fit in 64 bits).It is not difficult to create query with even larger estimation ofnumber of tuples, for example:


create table t1(pk1 integer, sk1 integer);
create table t2(pk2 integer, sk2 integer);
create table t3(pk3 integer, sk3 integer);
create table t4(pk4 integer, sk4 integer);
create table t5(pk5 integer, sk5 integer);
create table t6(pk6 integer, sk6 integer);
insert into t1 values (generate_series(1,1000000), 0);
insert into t2 values (generate_series(1,1000000), 0);
insert into t3 values (generate_series(1,1000000), 0);
insert into t4 values (generate_series(1,1000000), 0);
insert into t5 values (generate_series(1,1000000), 0);
insert into t6 values (generate_series(1,1000000), 0);
analyze t1;
analyze t2;
analyze t3;
analyze t4;
analyze t5;
analyze t6;

explain select count(*) from t1 join t2 on sk1=sk2 join t3 on sk2=sk3join t4 on sk3=sk4 join t5 on sk4=sk5 join t6 on sk5=sk6;

         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Finalize Aggregate(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00rows=1 width=8) -> Gather(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00rows=2 width=8)

         Workers Planned: 4

-> Partial Aggregate(cost=2994791666679459822091698054365184.00..2994791666679459822091698054365184.00rows=1 width=8) -> Parallel Hash Join(cost=9195963541715053182976.00..1953125000012793188098301096886272.00rows=416666666666666672044102856701116416 width=0)

                     Hash Cond: (t1.sk1 = t3.sk3)

-> Merge Join (cost=180939.82..6250190523.16rows=416666666667 width=8)

                           Merge Cond: (t1.sk1 = t2.sk2)

-> Sort (cost=53182.48..54224.15rows=416667 width=4)

                                 Sort Key: t1.sk1

-> Parallel Seq Scan on t1(cost=0.00..8591.67 rows=416667 width=4) -> Materialize (cost=127757.34..132757.34rows=1000000 width=4) -> Sort (cost=127757.34..130257.34rows=1000000 width=4)

                                       Sort Key: t2.sk2

-> Seq Scan on t2(cost=0.00..14425.00 rows=1000000 width=4) -> Parallel Hash(cost=1953125000038385975296.00..1953125000038385975296.00rows=416666666666666659676160 width=16) -> Parallel Hash Join(cost=23086308963.32..1953125000038385975296.00rows=416666666666666659676160 width=16)

                                 Hash Cond: (t3.sk3 = t5.sk5)

-> Merge Join(cost=180939.82..6250190523.16 rows=416666666667 width=8)

                                       Merge Cond: (t3.sk3 = t4.sk4)

-> Sort(cost=53182.48..54224.15 rows=416667 width=4)

                                             Sort Key: t3.sk3

-> Parallel Seq Scan ont3 (cost=0.00..8591.67 rows=416667 width=4) -> Materialize(cost=127757.34..132757.34 rows=1000000 width=4) -> Sort(cost=127757.34..130257.34 rows=1000000 width=4)

                                                   Sort Key: t4.sk4

-> Seq Scan on t4 (cost=0.00..14425.00 rows=1000000 width=4) -> Parallel Hash(cost=6250190523.16..6250190523.16 rows=416666666667 width=8) -> Merge Join(cost=180939.82..6250190523.16 rows=416666666667 width=8)

                                             Merge Cond: (t5.sk5 = t6.sk6)

-> Sort(cost=53182.48..54224.15 rows=416667 width=4)

                                                   Sort Key: t5.sk5

-> Parallel SeqScan on t5 (cost=0.00..8591.67 rows=416667 width=4) -> Materialize(cost=127757.34..132757.34 rows=1000000 width=4) -> Sort(cost=127757.34..130257.34 rows=1000000 width=4)

                                                         Sort Key: t6.sk6

-> Seq Scan ont6 (cost=0.00..14425.00 rows=1000000 width=4)


Please notice that this query doesn't cause alloc error.

But I wonder if we should limit maxima; number of estimated rows,because it seems to be clear that none system can proceed 20^25 rows insome reasonable amount of time. So if we have such estimation thateither there is some error in estimations and fortunately the query isfinished much faster, either it is not finished at all. But fromresource allocation point of view there is no sense in trying toallocate resources for 10^25 rows.

DSA overflow in hash join

Reply via email to