Bug #17759: GENERATED columns not computed during MERGE

Dean Rasheed Sun, 29 Jan 2023 01:58:04 -0800

I spent a little time investigating bug #17759 [1] in more detail.
Initially, I thought that it had been fixed by 3f7836ff65, but it
turns out that's not the case.


[1] 
https://www.postgresql.org/message-id/17759-e76d9bece1b5421c%40postgresql.org

The immediate cause of the bug was that, before 3f7836ff65, the set of
generated columns to be updated depended on extraUpdatedCols from the
target RTE, and for MERGE, this was not being populated. 3f7836ff65
appeared to fix that (it fixes the test case in the bug report) by no
longer relying on rte->extraUpdatedCols, but unfortunately there's a
little more to it than that.

Since 3f7836ff65, ExecInitModifyTable() calls
ExecInitStoredGenerated() if the command is an INSERT or an UPDATE,
but not if it's a MERGE. This means that the generated column info
doesn't get built until later (when a merge action actually executes
for the first time). If the first merge action to execute is an
UPDATE, and no updated columns require generated columns to be
recomputed, then ExecInitStoredGenerated() will skip those generated
columns and not generate ri_GeneratedExprs / ri_extraUpdatedCols info
for them. That's a problem, however, since the MERGE might also
contain an INSERT that gets executed later, for which it isn't safe to
skip any of the generated columns. Here's a simple reproducer:

CREATE TABLE t (
  id int PRIMARY key,
  val int,
  str text,
  upper_str text GENERATED ALWAYS AS (upper(str)) STORED
);

INSERT INTO t VALUES (1, 10, 'orig');

MERGE INTO t
  USING (VALUES (1, 100), (2, 200)) v(id, val) ON t.id = v.id
  WHEN MATCHED THEN UPDATE SET val = v.val
  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.val, 'new');

SELECT * FROM t;

 id | val | str  | upper_str
----+-----+------+-----------
  1 | 100 | orig | ORIG
  2 | 200 | new  |
(2 rows)


So we need to ensure that ExecInitModifyTable() calls
ExecInitStoredGenerated() for MERGE. Passing CMD_MERGE to
ExecInitStoredGenerated() is good enough, since anything other than
CMD_UPDATE causes it to not skip any generated columns. That could be
improved by examining the merge action list (it would be OK to skip
generated columns as long as the MERGE didn't contain an INSERT
action), but I don't think it's worth the extra effort / risk.

So I think we need the attached in HEAD and v15.

Regards,
Dean

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
new file mode 100644
index f419c47..1ac6517
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4141,12 +4141,12 @@ ExecInitModifyTable(ModifyTable *node, E
 		}
 
 		/*
-		 * For INSERT and UPDATE, prepare to evaluate any generated columns.
+		 * For INSERT/UPDATE/MERGE, prepare to evaluate any generated columns.
 		 * We must do this now, even if we never insert or update any rows,
 		 * because we have to fill resultRelInfo->ri_extraUpdatedCols for
 		 * possible use by the trigger machinery.
 		 */
-		if (operation == CMD_INSERT || operation == CMD_UPDATE)
+		if (operation == CMD_INSERT || operation == CMD_UPDATE || operation == CMD_MERGE)
 			ExecInitStoredGenerated(resultRelInfo, estate, operation);
 	}
 
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
new file mode 100644
index 9867748..11940c8
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -192,6 +192,26 @@ SELECT * FROM gtest1 ORDER BY a;
  3 | 6
 (1 row)
 
+-- test MERGE
+CREATE TABLE gtestm (
+  id int PRIMARY KEY,
+  f1 int,
+  f2 int,
+  f3 int GENERATED ALWAYS AS (f1 * 2) STORED,
+  f4 int GENERATED ALWAYS AS (f2 * 2) STORED
+);
+INSERT INTO gtestm VALUES (1, 5, 100);
+MERGE INTO gtestm t USING (VALUES (1, 10), (2, 20)) v(id, f1) ON t.id = v.id
+  WHEN MATCHED THEN UPDATE SET f1 = v.f1
+  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.f1, 200);
+SELECT * FROM gtestm ORDER BY id;
+ id | f1 | f2  | f3 | f4  
+----+----+-----+----+-----
+  1 | 10 | 100 | 20 | 200
+  2 | 20 | 200 | 40 | 400
+(2 rows)
+
+DROP TABLE gtestm;
 -- views
 CREATE VIEW gtest1v AS SELECT * FROM gtest1;
 SELECT * FROM gtest1v;
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
new file mode 100644
index 92d373b..87ec842
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -81,6 +81,21 @@ SELECT * FROM gtest1 ORDER BY a;
 DELETE FROM gtest1 WHERE b = 2;
 SELECT * FROM gtest1 ORDER BY a;
 
+-- test MERGE
+CREATE TABLE gtestm (
+  id int PRIMARY KEY,
+  f1 int,
+  f2 int,
+  f3 int GENERATED ALWAYS AS (f1 * 2) STORED,
+  f4 int GENERATED ALWAYS AS (f2 * 2) STORED
+);
+INSERT INTO gtestm VALUES (1, 5, 100);
+MERGE INTO gtestm t USING (VALUES (1, 10), (2, 20)) v(id, f1) ON t.id = v.id
+  WHEN MATCHED THEN UPDATE SET f1 = v.f1
+  WHEN NOT MATCHED THEN INSERT VALUES (v.id, v.f1, 200);
+SELECT * FROM gtestm ORDER BY id;
+DROP TABLE gtestm;
+
 -- views
 CREATE VIEW gtest1v AS SELECT * FROM gtest1;
 SELECT * FROM gtest1v;

Bug #17759: GENERATED columns not computed during MERGE

Reply via email to