[HACKERS] MERGE Specification

Simon Riggs Mon, 21 Apr 2008 13:07:43 -0700

The following two files specify the behaviour of the MERGE statement and
how it will work in the world of PostgreSQL. In places, this supercedes
my recent commentary on MERGE, particularly with regard to triggers.


Neither of these files is intended for CVS. 

The HTML file was generated from SGML source, though the latter is not
included here for clarity.

The test file shows how I expect a successful test run to look when a
regression test is executed with a working version of final MERGE patch
applied. It has behavioural comments in it also, to make it slightly
more readable.

If anybody has any questions, ask them now please, before I begin
detailed implementation.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

--
-- MERGE
--
CREATE TABLE target (id integer, balance integer);
CREATE TABLE source (id integer, balance integer);
INSERT INTO target VALUES (1, 10);
INSERT INTO target VALUES (2, 20);
INSERT INTO target VALUES (3, 30);
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
(3 rows)

--
-- initial tests
--
-- empty source means 0 rows touched
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
;
-- insert some source rows to work from
INSERT INTO source VALUES (2, 5);
INSERT INTO source VALUES (3, 20);
INSERT INTO source VALUES (4, 40);
SELECT * FROM source;
 id | balance 
----+---------
  2 |       5
  3 |      20
  4 |      40
(3 rows)

-- do a simple equivalent of an UPDATE join
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      25
  3 |      50
(3 rows)

ROLLBACK;
-- do a simple equivalent of an INSERT SELECT
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
  4 |      40
(4 rows)

ROLLBACK;
-- now the classic UPSERT
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	UPDATE SET balance = t.balance + s.balance
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      25
  3 |      50
  4 |      40
(4 rows)

ROLLBACK;
--
-- Non-standard functionality
-- 
-- do a simple equivalent of a DELETE join
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED THEN
	DELETE
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
(1 row)

ROLLBACK;
-- now the classic UPSERT, with a DELETE
-- the Standard doesn't allow the DELETE clause for some reason,
-- though other implementations do
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  3 |      50
  4 |      40
(3 rows)

ROLLBACK;
-- Prepare the test data to generate multiple matching rows for a single target
INSERT INTO source VALUES (3, 5);
SELECT * FROM source ORDER BY id, balance;
 id | balance 
----+---------
  2 |       5
  3 |       5
  3 |      20
  4 |      40
(4 rows)

-- we now have a duplicate key in source, so when we join to
-- target we will generate 2 matching rows, not one
-- In the following statement row id=3 will be both updated
-- and deleted by this statement and so will cause a run-time error
-- when the second change to that row is detected
-- This next SQL statement
--  fails according to standard
--  fails in PostgreSQL implementation
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
ERROR:  multiple actions on single target row
 
ROLLBACK;

-- This next SQL statement
--  fails according to standard
--  suceeds in PostgreSQL implementation by simply ignoring the second
--  matching row since it activates no WHEN clause
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 10 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
ROLLBACK;
-- Now lets prepare the test data to generate 2 non-matching rows
DELETE FROM source WHERE id = 3 AND balance = 5;
INSERT INTO source VALUES (4, 5);
SELECT * FROM source;
 id | balance 
----+---------
  2 |       5
  3 |      20
  4 |       5
  4 |      40
(4 rows)

-- This next SQL statement
--  suceeds according to standard (yes, it is inconsistent)
--  suceeds in PostgreSQL implementation, though could easily fail if
--  there was an appropriate unique constraint
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
  4 |       5
  4 |      40
(5 rows)

ROLLBACK;
-- This next SQL statement works, but since there is no WHEN clause that
-- applies to non-matching rows, SQL standard requires us to generate
-- rows with DEFAULT VALUES for all columns, which is why we support the
-- syntax DO NOTHING (similar to the way Rules work) in addition
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED AND s.balance > 100 THEN
	INSERT VALUES (s.id, s.balance)
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
    |
    |
(5 rows)

ROLLBACK;
-- This next SQL statement suceeds, but does nothing since there are
-- only non-matching rows that do not activate a WHEN clause, so we
-- provide syntax to just ignore them, rather than allowing data quality
-- problems
BEGIN;
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN NOT MATCHED AND s.balance > 100 THEN
	INSERT VALUES (s.id, s.balance)
WHEN NOT MATCHED
	DO NOTHING
;
SELECT * FROM target;
 id | balance 
----+---------
  1 |      10
  2 |      20
  3 |      30
(3 rows)

ROLLBACK;
--
-- Weirdness
--
-- MERGE statement containing WHEN clauses that are never executable
-- NOT an error under the standard
MERGE into target t
USING (select * from source) AS s
ON t.id = s.id
WHEN MATCHED AND s.balance > 0 THEN
	UPDATE SET balance = t.balance + s.balance
WHEN MATCHED THEN
	DELETE
WHEN NOT MATCHED THEN
	INSERT VALUES (s.id, s.balance)
WHEN NOT MATCHED THEN /* never executed because of order of WHEN clauses */
	INSERT VALUES (s.id, s.balance + 10)
WHEN MATCHED THEN /* never executed because of order of WHEN clauses */
	UPDATE SET balance = t.balance + s.balance
;

Title: MERGE

PostgreSQL 8.4devel Documentation
Prev	Fast Backward		Fast Forward	Next

MERGE

Name

MERGE -- update, insert or delete rows of a table based upon source data

Synopsis

MERGE INTO table [ [ AS ] alias ]
USING source-query
ON join_condition
[when_clause [...]]

where when_clause is

{ WHEN MATCHED [ AND condition ] THEN { merge_update | DELETE }
  WHEN NOT MATCHED [ AND condition ] THEN { merge_insert | DO NOTHING } }

where merge_update is

UPDATE SET { column = { _expression_ | DEFAULT } |
           ( column [, ...] ) = ( { _expression_ | DEFAULT } [, ...] ) } [, ...]

and merge_insert is

INSERT [( column [, ...] )] { VALUES ( { _expression_ | DEFAULT } [, ...] ) | DEFAULT VALUES }

Description

MERGE performs at most one action on rows from the target table, driven by the rows from the source query. This provides a way to specify a single SQL statement that can conditionally UPDATE or INSERT rows, a task that would otherwise require multiple procedural language statements.

First, the MERGE command performs a left outer join from source query to target table, producing zero or more merged rows. For each merged row, WHEN clauses are evaluated in the specified order until one of them is activated. The corresponding action is then applied and processing continues for the next row.

MERGE actions have the same effect as regular UPDATE, INSERT, or DELETE commands of the same names, though the syntax is slightly different.

If no WHEN clause activates then an implicit action of INSERT DEFAULT VALUES is performed for that row. If that implicit action is not desirable an explicit action of DO NOTHING may be specified instead.

MERGE will only affect rows only in the specified table.

There is no RETURNING clause with MERGE.

There is no MERGE privilege. You must have the UPDATE privilege on the table if you specify an update action, the INSERT privilege if you specify an insert action and/or the DELETE privilege if you wish to delete. You will also require the SELECT privilege to any table whose values are read in the expressions or condition.

Parameters

table: The name (optionally schema-qualified) of the table to merge into.
alias: A substitute name for the target table. When an alias is provided, it completely hides the actual name of the table. For example, given MERGE foo AS f, the remainder of the MERGE statement must refer to this table as f not foo.
source-query: A query (SELECT statement or VALUES statement) that supplies the rows to be merged into the target table. Refer to the SELECT statement or VALUES statement for a description of the syntax.
join_condition: join_condition is an _expression_ resulting in a value of type boolean (similar to a WHERE clause) that specifies which rows in the join are considered to match. You should ensure that the join produces at most one output row for each row to be modified. An attempt to modify any row of the target table more than once will result in an error. This behaviour requires the user to take greater care in using MERGE, though is required explicitly by the SQL Standard.
condition: An _expression_ that returns a value of type boolean. If this _expression_ returns true then the WHEN clause will be activated and the corresponding action will occur for that row.
merge_update: The specification of an UPDATE action. Do not include the table name, as you would normally do with an UPDATE command. For example, UPDATE tab SET col = 1 is invalid. Also, do not include a WHERE clause, since only the current can be updated. For example, UPDATE SET col = 1 WHERE key = 57 is invalid.
merge_insert: The specification of an INSERT action. Do not include the table name, as you would normally do with an INSERT command. For example, INSERT INTO tab VALUES (1, 50) is invalid.
column: The name of a column in table. The column name can be qualified with a subfield name or array subscript, if needed. Do not include the table's name in the specification of a target column — for example, UPDATE SET tab.col = 1 is invalid.
_expression_: An _expression_ to assign to the column. The _expression_ can use the old values of this and other columns in the table.
DEFAULT: Set the column to its default value (which will be NULL if no specific default _expression_ has been assigned to it).

Outputs

On successful completion, a MERGE command returns a command tag of the form

MERGE total-count

The total-count is the number of rows changed (either updated, inserted or deleted). If total-count is 0, no rows were changed (this is not considered an error).

The number of rows updated, inserted or deleted is not available as part of the command tag. An optional NOTIFY message can be generated to present this information, if desired.

NOTIFY:  34 rows processed: 11 updated, 5 deleted, 15 inserted, 3 default inserts, 0 no action

Notes

What essentially happens is that the target table is left outer-joined to the tables mentioned in the source-query, and each output row of the join may then activate at most one when-clause. The row will be matched only once per statement, so the status of MATCHED or NOT MATCHED cannot change once testing of WHEN clauses has begun. MERGE will not invoke Rules.

The following steps take place during the execution of MERGE.

Perform any BEFORE STATEMENT triggers for actions specified, whether or not they actually occur.
Perform left outer join from source to target table. Then for each row:
1. Evaluate whether each row is MATCHED or NOT MATCHED.
2. Test each WHEN condition in the order specified until one activates. Identify the action and its event type.
3. Perform any BEFORE ROW triggers that fire for the action's event type.
4. Apply the action specified.
5. Perform any AFTER ROW triggers that fire for the action's event type.
Perform any AFTER STATEMENT triggers for actions specified, whether or not they actually occur.

In summary, statement triggers for an event type (say, INSERT) will be fired whenever we specify an action of that kind. Row-level triggers will fire only for event type activated. So a MERGE might fire statement triggers for both UPDATE and INSERT, even though only UPDATE row triggers were fired.

Examples

Attempt to insert a new stock item along with the quantity of stock. If the item already exists, instead update the stock count of the existing item.

MERGE INTO wines w
USING (VALUES('Chateau Lafite 2003', '24')) v
ON v.column1 = w.winename
WHEN NOT MATCHED 
  INSERT VALUES(v.column1, v.column2)
WHEN MATCHED
  UPDATE SET stock = stock + v.column2;

Perform maintenance on CustomerAccounts based upon new Transactions. The following statement will fail if any accounts have had more than one transaction

MERGE CustomerAccount CA

USING (SELECT CustomerId, TransactionValue, 
       FROM Transactions
       WHERE TransactionId > 35345678) AS T

ON T.CustomerId = CA.CustomerId

WHEN MATCHED 
  UPDATE SET Balance = Balance - TransactionValue

WHEN NOT MATCHED
  INSERT (CustomerId, Balance)
  VALUES (T.CustomerId, T.TransactionValue)
;

so the right way to do this is to pre-aggregate the data

MERGE CustomerAccount CA

USING (SELECT CustomerId, Sum(TransactionValue) As TransactionSum
       FROM Transactions
       WHERE TransactionId > 35345678
       GROUP BY CustomerId) AS T

ON T.CustomerId = CA.CustomerId

WHEN MATCHED 
  UPDATE SET Balance = Balance - TransactionSum

WHEN NOT MATCHED
  INSERT (CustomerId, Balance)
  VALUES (T.CustomerId, T.TransactionSum)
;

Compatibility

This command conforms to the SQL standard, except that the DELETE and DO NOTHING actions are PostgreSQL extensions.

According to the standard, the column-list syntax for an UPDATE action should allow a list of columns to be assigned from a single row-valued _expression_. This is not currently implemented — the source must be a list of independent expressions.

Prev	Home	Next
LOCK	Up	MOVE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers