I did a little more perf testing with this. I'm seeing the same benefit with the query you posted. But can we find a case where it's not beneficial? If I understand correctly, when the input slot is a virtual slot, it's cheaper to copy it to another virtual slot than to form a minimal tuple. Like in your test case. What if the input is a minimial tuple?

On master:

postgres=# set enable_hashagg=off;
SET
postgres=# explain analyze select distinct g::text, 'a', 'b', 'c','d', 'e','f','g','h' from generate_series(1, 5000000) g;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2630852.42..2655852.42 rows=200 width=288) (actual time=4525.212..6576.992 rows=5000000 loops=1) -> Sort (cost=2630852.42..2643352.42 rows=5000000 width=288) (actual time=4525.211..5960.967 rows=5000000 loops=1)
         Sort Key: ((g)::text)
         Sort Method: external merge  Disk: 165296kB
-> Function Scan on generate_series g (cost=0.00..75000.00 rows=5000000 width=288) (actual time=518.914..1194.702 rows=5000000 loops=1)
 Planning Time: 0.036 ms
 JIT:
   Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.242 ms (Deform 0.035 ms), Inlining 63.457 ms, Optimization 29.764 ms, Emission 20.592 ms, Total 114.056 ms
 Execution Time: 6766.399 ms
(11 rows)


With the patch:

postgres=# set enable_hashagg=off;
SET
postgres=# explain analyze select distinct g::text, 'a', 'b', 'c','d', 'e','f','g','h' from generate_series(1, 5000000) g;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=2630852.42..2655852.42 rows=200 width=288) (actual time=4563.639..7362.467 rows=5000000 loops=1) -> Sort (cost=2630852.42..2643352.42 rows=5000000 width=288) (actual time=4563.637..6069.000 rows=5000000 loops=1)
         Sort Key: ((g)::text)
         Sort Method: external merge  Disk: 165296kB
-> Function Scan on generate_series g (cost=0.00..75000.00 rows=5000000 width=288) (actual time=528.060..1191.483 rows=5000000 loops=1)
 Planning Time: 0.720 ms
 JIT:
   Functions: 5
Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 0.406 ms (Deform 0.065 ms), Inlining 68.385 ms, Optimization 21.656 ms, Emission 21.033 ms, Total 111.480 ms
 Execution Time: 7585.306 ms
(11 rows)


So not a win in this case. Could you peek at the outer slot type, and use the same kind of slot for the Unique's result? Or some more complicated logic, like use a virtual slot if all the values are pass-by-val? I'd also like to keep this simple, though...

Would this kind of optimization make sense elsewhere?

--
Heikki Linnakangas
Neon (https://neon.tech)



Reply via email to