Re: [HACKERS] MERGE Specification

Heikki Linnakangas Sun, 15 Aug 2010 00:57:31 -0700

On 15/08/10 10:31, Boxuan Zhai wrote:

On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:

Thanks. I went through this, fixing all the spurious end-of-line whitespace
first with "git apply --whitespace=fix", and then manually fixing comment
and whitespace style, typos, and doing some other small comment editing.
Resulting patch attached. It is also available at my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch
'mergestmt'. Please base any further patch versions on this patch, so that
we don't need to redo this cleanup.

Thanks for the cleanup. The codes looks better now. My problem is that I
have done some more modifications after merge_v102. Is there any way to
apply the cleanup patch without erasing my new codes?

Yeah, there's multiple ways. You can create a patch of what you havenow, and run interdiff against your previous patch that I workedagainst. That gives you a diff of changes since the last patch you sentto the list. You can then apply that patch to the codetree from my gitrepository.

Or you can just generate a new patch of what you have as usual, and I'llincorporate the changes to the cleaned-up patch.

I'll continue reviewing this sometime next week, but here's few
miscellaneous issues for starters;

* Explain output of actions needs some work:

  Merge  (cost=246.37..272.96 rows=1770 width=38)
    ACTION: UPDATE WHEN NOT MATCHED
    ACTION: INSERT WHEN NOT MATCHED
    ->   Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)


Should print out more details of the action, like for normal
updates/inserts/deletes.



Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
details than what we have here except the costs, rows and width, which is
print out at the MERGE command line.

For example:

Explain
update buy set volume = volume + 1;
                           QUERY PLAN
--------------------------------------------------------------
  Update  (cost=0.00..36.75 rows=2140 width=14)
    ->   Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
(2 rows)

For the explanation of an UPDATE command, we only have a Update title
followed by the costs, then, after the arrow ->  there comes the plan tree.
In a MERGE command, no action has its real private plan. They all share the
main plan.  Thus, the costs for a merge command is that of the main plan.
What is useful for a merge action is only the target list and quals. So my
design is to show the merge command costs in first line. Then print out the
actions and their qualifications in a list, followed by the main plan tree.


It's more more interesting with more complex statement:

postgres=# explain UPDATE target SET id = (SELECT COUNT(*) FROMgenerate_series(1,10));QUERY PLAN

--------------------------------------------------------------------------------------
 Update  (cost=12.51..52.52 rows=2400 width=6)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=12.50..12.51 rows=1 width=0)

-> Function Scan on generate_series (cost=0.00..10.00rows=1000 width=0)

   ->  Seq Scan on target  (cost=0.00..40.00 rows=2400 width=6)
(5 rows)

Is there any other thing you suggest to print out for each action?


It should match the output of a normal Update/Insert as closely as possible.

* Is the behavior now SQL-compliant? We had long discussions about the
default action being insert or do nothing or raise error, but I never got a
clear picture of what the SQL spec says and whether we're compliant.

* What does the "one tuple is error" notice mean? We'll have to do
something about that.. I don't think we've properly thought through the
meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
only a few dozen lines to put back when we know what to do about it.

I find that we have not reached an agreement on MERGE's syntax yet.
Personally, I support Simon's idea, that is the default action should be
RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is
error"); for this situation.

I leave this for further extension when a more specific design for RAISE
ERROR is available.

Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
you want me to chage the default action back to DO NOTHING? Or any other
suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE"
clause as the end of the action list which forces the user to specify the
default action for merge).


Whatever the spec says is what we should do.

* Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?


I need one flag in these statement to differentiate them from normal
InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
process of these two kinds of statements in the transfromStmt() function. I
define a set of alias nodes which have different nodetags. This can make
the code simpler.

Ok. You might also consider just adding a "isMergeAction" boolean toInsertStmt/UpdateStmt/DeleteStmt instead, or if the difference betweenthe regular statements and merge actions grow big, create a completelyseparate node struct for them.

* Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
InsertStmt?


Ah, I add this function because I want to have a look of the content of
DeleteStmt. I did this long ago, just after I started the project. If you
don't want this function, I will remove it.

Ok.

* Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
every time, should set flags at plan time to mark what kind of actions there
is in the statement.


First of all, I need to say that the functions of ExecBSMergeTriggers() and
ExecASMergeTriggers() are for per-statement triggers, and are invoked only
once in the whole process of a MERGE command. Setting flags at plan time can
save the scanning. I may add it to next patch.  But don't expect it save
much execution time.

Yeah, I'm not so much concerned about performance, it just seems like itmight make the code slightly simpler.

Or should we defer firing the statement triggers until we hit the first
matching row and execute the action for the first time?


No, we cannot do this. These are Per-statement triggers. They should be
fired before/after the main plan is executed. The statement triggers are
fired even no tuple is really matched with the actions.

Ok.

* This crashes:

postgres=# CREATE TABLE target AS (SELECT 1 AS id);
SELECT 1
postgres=# MERGE into target t
USING (select 1 AS id) AS s

ON t.id = s.id
WHEN MATCHED THEN
        UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


Oh, a really serious bug. I have not considered this case (user series
generator in actions) before. I will see what I can do for it.

It's not specific to generate_series() but subqueries in general thatare the problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE Specification

Reply via email to