On Mon, Jul 15, 2024 at 8:15 AM Richard Guo <guofengli...@gmail.com> wrote:
> > We can see that with the 0001 patch, this query runs ~3 times faster, > which is no surprise because there are 3 instances of the same > subquery in the targetlist. I am not sure if that's the right thing to do. I am using a slightly elaborate version of the tests in your patch #select v, grouping(v) gv, grouping((select t1.v from gstest5 t2 where id = t1.id)) gs,grouping((select t1.v from gstest5 t2 where id = t1.id)) gs2, (select t1.v from gstest5 t2 where id = t1.id) as s, case when grouping(v) = 0 then v else null end as cv, case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0 then (select t1.v from gstest5 t2 where id = t1.id) else null end as cs from gstest5 t1 group by grouping sets(v, s) ; v | gv | gs | gs2 | s | cv | cs ---+----+----+-----+---+----+---- 3 | 0 | 1 | 1 | | 3 | 5 | 0 | 1 | 1 | | 5 | 4 | 0 | 1 | 1 | | 4 | 2 | 0 | 1 | 1 | | 2 | 1 | 0 | 1 | 1 | | 1 | | 1 | 0 | 0 | 2 | | 2 | 1 | 0 | 0 | 5 | | 5 | 1 | 0 | 0 | 4 | | 4 | 1 | 0 | 0 | 3 | | 3 | 1 | 0 | 0 | 1 | | 1 (10 rows) #explain verbose select v, grouping(v) gv, grouping((select t1.v from gstest5 t2 where id = t1.id)) gs,grouping((select t1.v from gstest5 t2 w here id = t1.id)) gs2, (select t1.v from gstest5 t2 where id = t1.id) as s, case when grouping(v) = 0 then v else null end as cv, case when grouping((select t1.v from gstest5 t2 where id = t1.id)) = 0 then (select t1.v from gstest5 t2 where id = t1.id) else null end as cs from gstest5 t1 group by grouping sets(v, s) ; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------- HashAggregate (cost=18508.10..58790.10 rows=2460 width=28) Output: t1.v, GROUPING(t1.v), GROUPING((SubPlan 2)), GROUPING((SubPlan 3)), ((SubPlan 1)), CASE WHEN (GROUPING(t1.v) = 0) THEN t1.v ELSE NULL::integer END , CASE WHEN (GROUPING((SubPlan 4)) = 0) THEN ((SubPlan 1)) ELSE NULL::integer END Hash Key: t1.v Hash Key: (SubPlan 1) -> Seq Scan on pg_temp.gstest5 t1 (cost=0.00..18502.45 rows=2260 width=12) Output: t1.v, (SubPlan 1), t1.id SubPlan 1 -> Index Only Scan using gstest5_pkey on pg_temp.gstest5 t2 (cost=0.15..8.17 rows=1 width=4) Output: t1.v Index Cond: (t2.id = t1.id) The result looks as expected but the plan isn't consistent with what happens without grouping set #select v, (select t1.v from gstest5 t2 where id = t1.id) as s, (select t1.v from gstest5 t2 where id = t1.id) as s2, case when t1.v < 3 then (select t1.v from gstest5 t2 where id = t1.id) else null end as cs from gstest5 t1 order by case when t1.v < 3 then (select t1.v from gstest5 t2 where id = t1.id) else null end ; v | s | s2 | cs ---+---+----+---- 1 | 1 | 1 | 1 2 | 2 | 2 | 2 3 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 5 | (5 rows) postgres@92841=#explain verbose select v, (select t1.v from gstest5 t2 where id = t1.id) as s, (select t1.v from gstest5 t2 where id = t1.id) as s2, case when t1.v < 3 then (select t1.v from gstest5 t2 where id = t1.id) else null end as cs from gstest5 t1 order by case when t1.v < 3 then (select t1.v from gstest5 t2 where id = t1.id) else null end ; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Sort (cost=55573.71..55579.36 rows=2260 width=16) Output: t1.v, ((SubPlan 1)), ((SubPlan 2)), (CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END) Sort Key: (CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END) -> Seq Scan on pg_temp.gstest5 t1 (cost=0.00..55447.80 rows=2260 width=16) Output: t1.v, (SubPlan 1), (SubPlan 2), CASE WHEN (t1.v < 3) THEN (SubPlan 3) ELSE NULL::integer END SubPlan 1 -> Index Only Scan using gstest5_pkey on pg_temp.gstest5 t2 (cost=0.15..8.17 rows=1 width=4) Output: t1.v Index Cond: (t2.id = t1.id) SubPlan 2 -> Index Only Scan using gstest5_pkey on pg_temp.gstest5 t2_1 (cost=0.15..8.17 rows=1 width=4) Output: t1.v Index Cond: (t2_1.id = t1.id) SubPlan 3 -> Index Only Scan using gstest5_pkey on pg_temp.gstest5 t2_2 (cost=0.15..8.17 rows=1 width=4) Output: t1.v Index Cond: (t2_2.id = t1.id) (17 rows) Notice that every instance of that subquery has its own subplan in this case. Why should the grouping set be different and have the same subplan for two instances of the subquery? And if so, why not all of the instances have the same subplan? Since a subquery is a volatile expression, each of its instances should be evaluated separately. If the expressions in ORDER BY, GROUPING and GROUP BY are the same as an expression in the targetlist, subqueries in those expressions won't need a subplan of their own. If they are not part of targetlist, they will be added to the targetlist as resjunk columns and thus form separate instances of subquery thus adding more subplans. -- Best Wishes, Ashutosh Bapat