On 10/31/24 08:16, David Rowley wrote:
On Tue, 29 Oct 2024 at 22:47, David Rowley <[email protected]> wrote:I've attached an updated patch with a few other fixes. Whilr checking this tonight, noticed that master does not use SubPlanState.tab_eq_funcs for anything. I resisted removing that in this patch. Perhaps a follow-on patch can remove that. I suspect it's not been used for a long time now, but I didn't do the archaeology work to find out.3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a rebased patch.
Thanks for sharing this.I still need to dive deeply into the code. But I have one annoying user case where the user complained about a 4x SQL server speedup in comparison to Postgres, and I guess it is a good benchmark for your code. This query is remarkable because of high grouping computation load. Of course, I can't provide the user's data, but I have prepared a synthetic test to reproduce the case (see attachment). Comparing the master with and without your patch, the first, I see is more extensive usage of memory (see complete explains in the attachment):
Current master: --------------- Partial HashAggregate (cost=54492.60..55588.03 rows=19917 width=889) (actual time=20621.028..20642.664 rows=10176 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 1 Memory Usage: 74513kB Patched: -------- Partial HashAggregate (cost=54699.91..55799.69 rows=19996 width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9) Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5 Batches: 261 Memory Usage: 527905kB Disk Usage: 4832656kBI wonder what causes memory consumption, but it is hard to decide on the patch's positive outcome for now.
-- regards, Andrei Lepikhov
/*
-- Master
Finalize HashAggregate (cost=88853.57..89949.01 rows=19917 width=889) (actual
time=20960.694..20994.725 rows=40000 loops=1)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 1 Memory Usage: 280081kB
-> Gather (cost=55492.60..72521.63 rows=159336 width=889) (actual
time=20640.902..20680.058 rows=91580 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Partial HashAggregate (cost=54492.60..55588.03 rows=19917
width=889) (actual time=20621.028..20642.664 rows=10176 loops=9)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 1 Memory Usage: 74513kB
Worker 0: Batches: 1 Memory Usage: 66329kB
Worker 1: Batches: 1 Memory Usage: 58137kB
Worker 2: Batches: 1 Memory Usage: 58137kB
Worker 3: Batches: 1 Memory Usage: 66329kB
Worker 4: Batches: 1 Memory Usage: 58137kB
Worker 5: Batches: 1 Memory Usage: 74521kB
Worker 6: Batches: 1 Memory Usage: 58137kB
Worker 7: Batches: 1 Memory Usage: 66329kB
-> Parallel Hash Join (cost=4921.00..23223.33 rows=305066
width=879) (actual time=667.304..5047.261 rows=17361113 loops=9)
Hash Cond: ((t1.x1 = t2.x1) AND (t1.x2 = t2.x2) AND (t1.x3
= t2.x3) AND (t1.x4 = t2.x4))
-> Parallel Seq Scan on t1 (cost=0.00..4671.00
rows=12500 width=868) (actual time=0.009..7.437 rows=11111 loops=9)
-> Parallel Hash (cost=4671.00..4671.00 rows=12500
width=319) (actual time=666.929..666.930 rows=11111 loops=9)
Buckets: 131072 Batches: 1 Memory Usage: 35776kB
-> Parallel Seq Scan on t2 (cost=0.00..4671.00
rows=12500 width=319) (actual time=560.944..562.943 rows=11111 loops=9)
Planning Time: 1.619 ms
JIT:
Functions: 149
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 64.711 ms (Deform 13.706 ms), Inlining 846.944 ms,
Optimization 2602.412 ms, Emission 1599.170 ms, Total 5113.237 ms
Execution Time: 21030.501 ms
(29 rows)
*/
/*
-- Master + patch
Finalize HashAggregate (cost=89193.21..90292.99 rows=19996 width=889) (actual
time=469100.662..469136.040 rows=40000 loops=1)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 1 Memory Usage: 8963609kB
-> Gather (cost=55699.91..72796.49 rows=159968 width=889) (actual
time=58325.864..202369.963 rows=92724640 loops=1)
Workers Planned: 8
Workers Launched: 8
-> Partial HashAggregate (cost=54699.91..55799.69 rows=19996
width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9)
Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
Batches: 261 Memory Usage: 527905kB Disk Usage: 4832656kB
Worker 0: Batches: 1 Memory Usage: 247321kB
Worker 1: Batches: 261 Memory Usage: 527417kB Disk Usage:
3850624kB
Worker 2: Batches: 261 Memory Usage: 527905kB Disk Usage:
4304528kB
Worker 3: Batches: 325 Memory Usage: 527417kB Disk Usage:
4315120kB
Worker 4: Batches: 261 Memory Usage: 527905kB Disk Usage:
4403160kB
Worker 5: Batches: 261 Memory Usage: 527417kB Disk Usage:
4770368kB
Worker 6: Batches: 389 Memory Usage: 527417kB Disk Usage:
4797952kB
Worker 7: Batches: 229 Memory Usage: 527905kB Disk Usage:
4116552kB
-> Parallel Hash Join (cost=4921.00..23411.07 rows=305257
width=879) (actual time=755.402..6741.900 rows=17361113 loops=9)
Hash Cond: ((t2.x1 = t1.x1) AND (t2.x2 = t1.x2) AND (t2.x3
= t1.x3) AND (t2.x4 = t1.x4))
-> Parallel Seq Scan on t2 (cost=0.00..4671.00
rows=12500 width=319) (actual time=0.016..14.229 rows=11111 loops=9)
-> Parallel Hash (cost=4671.00..4671.00 rows=12500
width=868) (actual time=755.027..755.029 rows=11111 loops=9)
Buckets: 131072 Batches: 1 Memory Usage: 37536kB
-> Parallel Seq Scan on t1 (cost=0.00..4671.00
rows=12500 width=868) (actual time=0.006..1.634 rows=11111 loops=9)
Planning Time: 5.485 ms
JIT:
Functions: 174
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 79.608 ms (Deform 33.590 ms), Inlining 1037.222 ms,
Optimization 3900.018 ms, Emission 2594.133 ms, Total 7610.981 ms
Execution Time: 470295.518 ms
*/
synth.sql
Description: application/sql
