Hi,

On 2026-02-19 18:16:57 +0200, Ants Aasma wrote:
> > So it's a parallel aggregate? Partial + Finalize? I wonder if that might
> > be "correlating" the data in a way that makes it more likely to hit
> > SH_GROW_MAX_MOVE. But If that was the case, wouldn't we see this issue
> > more often?
> 
> Interestingly the plan doesn't have partial and final on those hash agg nodes:
> 
>                      ->  HashAggregate  (cost=142400.87..142800.87
> rows=40000 width=16) (actual time=7978.262..9591.682 rows=3698243
> loops=1)
>                            Group Key: "*SELECT* 2_4".vehicle_id,
> "*SELECT* 2_4".day
>                            Batches: 21  Memory Usage: 65593kB  Disk
> Usage: 118256kB
>                            ->  Gather  (cost=133600.87..142000.87
> rows=80000 width=16) (actual time=1898.473..4772.296 rows=3698243
> loops=1)
>                                  Workers Planned: 2
>                                  Workers Launched: 2
>                                  ->  HashAggregate
> (cost=132600.87..133000.87 rows=40000 width=16) (actual
> time=1586.697..2040.368 rows=1232748 loops=3)
>                                        Group Key: "*SELECT*
> 2_4".vehicle_id, "*SELECT* 2_4".day
>                                        Batches: 1  Memory Usage: 5137kB
>                                        Worker 0:  Batches: 5  Memory
> Usage: 79921kB  Disk Usage: 40024kB
>                                        Worker 1:  Batches: 5  Memory
> Usage: 81969kB  Disk Usage: 36112kB
> 
> There are timescale tables involved in the plan, so I think timescale
> might be behind that.

Hm, so timescale creates a plan that we would not?


> There is this comment above the simplehash growing logic:

> * To avoid negative consequences from overly imbalanced
> * hashtables, grow the hashtable if collisions would require
> * us to move a lot of entries.  The most likely cause of such
> * imbalance is filling a (currently) small table, from a
> * currently big one, in hash-table order.
> 
> The problem disappears if I have a breakpoint on tuplehash_grow, so
> apparently triggering the problem requires that the lower hashtable
> scans interleave in a particular manner to trigger the excess growth
> of the upper node.
> 
> I'm wondering if some way to decorrelate the hashtables would help.
> For example a hashtable specific (pseudo)random salt.

We do try to add a hash-IV that's different for each worker:

        /*
         * If parallelism is in use, even if the leader backend is performing 
the
         * scan itself, we don't want to create the hashtable exactly the same 
way
         * in all workers. As hashtables are iterated over in keyspace-order,
         * doing so in all processes in the same way is likely to lead to
         * "unbalanced" hashtables when the table size initially is
         * underestimated.
         */
        if (use_variable_hash_iv)
                hash_iv = murmurhash32(ParallelWorkerNumber);


I don't remember enough of how the parallel aggregate stuff works. Perhaps the
issue is that the leader is also building a hashtable and it's being inserted
into the post-gather hashtable, using the same IV?

In which case parallel_leader_participation=off should make a difference.

Greetings,

Andres Freund


Reply via email to