Re: Broken EXPLAIN output for SubPlan in MERGE

2024-03-21 Thread Alvaro Herrera
On 2024-Mar-12, Dean Rasheed wrote:

> While playing around with EXPLAIN and SubPlans, I noticed that there's
> a bug in how this is handled for MERGE. [...]

> However, that isn't working properly for MERGE because the inner_plan
> and inner_tlist of the corresponding deparse_namespace aren't set
> correctly. Actually the inner_tlist is correct, but the inner_plan is
> set to the ModifyTable node, whereas it needs to be the outer child
> node -- in a MERGE, any references to the source relation will be
> INNER_VAR references to the targetlist of the join node immediately
> under the ModifyTable node.

Hmm, interesting, thanks for fixing it (commit 33e729c5148c).  I remember
wondering whether the nodes ought to be set differently, and now I have to
admit that this

if (((ModifyTable *) plan)->operation == CMD_MERGE)
dpns->inner_plan = outerPlan(plan);

is very funny-looking.  But I didn't come up with any examples where it
mattered.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)




Broken EXPLAIN output for SubPlan in MERGE

2024-03-12 Thread Dean Rasheed
While playing around with EXPLAIN and SubPlans, I noticed that there's
a bug in how this is handled for MERGE. For example:

drop table if exists src, tgt, ref;
create table src (a int, b text);
create table tgt (a int, b text);
create table ref (a int);

explain (verbose, costs off)
merge into tgt t
  using (select (select r.a from ref r where r.a = s.a) a, b from src s) s
  on t.a = s.a
  when not matched then insert values (s.a, s.b);

QUERY PLAN
---
 Merge on public.tgt t
   ->  Merge Left Join
 Output: t.ctid, s.a, s.b, s.ctid
 Merge Cond: (((SubPlan 1)) = t.a)
 ->  Sort
   Output: s.a, s.b, s.ctid, ((SubPlan 1))
   Sort Key: ((SubPlan 1))
   ->  Seq Scan on public.src s
 Output: s.a, s.b, s.ctid, (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on public.ref r
 Output: r.a
 Filter: (r.a = s.a)
 ->  Sort
   Output: t.ctid, t.a
   Sort Key: t.a
   ->  Seq Scan on public.tgt t
 Output: t.ctid, t.a
   SubPlan 2
 ->  Seq Scan on public.ref r_1
   Output: r_1.a
   Filter: (r_1.a = t.ctid)

The final filter condition "(r_1.a = t.ctid)" is incorrect, and should
be "(r_1.a = s.a)".

What's happening is that the right hand side of that filter expression
is an input Param node which get_parameter() tries to display by
calling find_param_referent() and then drilling down through the
ancestor node (the ModifyTable node) to try to find the real name of
the variable (s.a).

However, that isn't working properly for MERGE because the inner_plan
and inner_tlist of the corresponding deparse_namespace aren't set
correctly. Actually the inner_tlist is correct, but the inner_plan is
set to the ModifyTable node, whereas it needs to be the outer child
node -- in a MERGE, any references to the source relation will be
INNER_VAR references to the targetlist of the join node immediately
under the ModifyTable node.

So I think we want to do something like the attached.

Regards,
Dean
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
new file mode 100644
index 2a1ee69..2231752
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -4988,8 +4988,11 @@ set_deparse_plan(deparse_namespace *dpns
 	 * For a WorkTableScan, locate the parent RecursiveUnion plan node and use
 	 * that as INNER referent.
 	 *
-	 * For MERGE, make the inner tlist point to the merge source tlist, which
-	 * is same as the targetlist that the ModifyTable's source plan provides.
+	 * For MERGE, pretend the ModifyTable's source plan (its outer plan) is
+	 * INNER referent.  This is the join from the target relation to the data
+	 * source, and all INNER_VAR Vars in other parts of the query refer to its
+	 * targetlist.
+	 *
 	 * For ON CONFLICT .. UPDATE we just need the inner tlist to point to the
 	 * excluded expression's tlist. (Similar to the SubqueryScan we don't want
 	 * to reuse OUTER, it's used for RETURNING in some modify table cases,
@@ -5004,17 +5007,17 @@ set_deparse_plan(deparse_namespace *dpns
 		dpns->inner_plan = find_recursive_union(dpns,
 (WorkTableScan *) plan);
 	else if (IsA(plan, ModifyTable))
-		dpns->inner_plan = plan;
-	else
-		dpns->inner_plan = innerPlan(plan);
-
-	if (IsA(plan, ModifyTable))
 	{
 		if (((ModifyTable *) plan)->operation == CMD_MERGE)
-			dpns->inner_tlist = dpns->outer_tlist;
+			dpns->inner_plan = outerPlan(plan);
 		else
-			dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
+			dpns->inner_plan = plan;
 	}
+	else
+		dpns->inner_plan = innerPlan(plan);
+
+	if (IsA(plan, ModifyTable) && ((ModifyTable *) plan)->operation == CMD_INSERT)
+		dpns->inner_tlist = ((ModifyTable *) plan)->exclRelTlist;
 	else if (dpns->inner_plan)
 		dpns->inner_tlist = dpns->inner_plan->targetlist;
 	else
diff --git a/src/test/regress/expected/merge.out b/src/test/regress/expected/merge.out
new file mode 100644
index e3ebf46..1a6f6ad
--- a/src/test/regress/expected/merge.out
+++ b/src/test/regress/expected/merge.out
@@ -1473,6 +1473,56 @@ WHEN MATCHED AND t.a < 10 THEN
 
 DROP TABLE ex_msource, ex_mtarget;
 DROP FUNCTION explain_merge(text);
+-- EXPLAIN SubPlans and InitPlans
+CREATE TABLE src (a int, b int, c int, d int);
+CREATE TABLE tgt (a int, b int, c int, d int);
+CREATE TABLE ref (ab int, cd int);
+EXPLAIN (verbose, costs off)
+MERGE INTO tgt t
+USING (SELECT *, (SELECT count(*) FROM ref r
+   WHERE r.ab = s.a + s.b
+ AND r.cd = s.c - s.d) cnt
+ FROM src s) s
+ON t.a = s.a AND t.b < s.cnt
+WHEN MATCHED AND t.c > s.cnt THEN
+  UPDATE SET (b, c) = (SELECT s.b, s.cnt);
+ QUERY PLAN