[HACKERS] new patch of MERGE (merge_204) a question about duplicated ctid
Dear Greg, I have updated the MERGE patch for two main problems. 1. Support of system attributes in MERGE action expressions. In old version of MERGE, I use the top join as the only RTE in var name space, during the transformation process in parser. It contains all the common attributes of source and target table but not the system attributes, such as oid. Thus, if user specified system attributes in MERGE actions, error pops. In current version, the var name space is forced to be the target table RTE and source table RTE. So system attributes will be correctly transformed. 2. fix the problem of dropped column. In planner, we need to map the Var nodes in MERGE actions to its corresponding TE in the main plan's TL. I used to do this under the assumption that all the TE are ordered by their attribute no (the value of filed Var.varattno). However, this is not the case when the table contain dropped attributes. The dropped attribute will take one attribute number but not appear in TL. Now a new mapping algorithm is forged, which can avoid this bug. Please help me to review this new patch. Thank you! PS: In the new pgsql 9.1, I find that the junk attribute ctid is added twice, for UPDATE and DELETE commands. In planner, the function preprocess_targetlist() will invoke a sub function expand_targetlist() which will add missed TE for INSERT and UPDATE commands. And for UPDATE and DELETE, it will also create a TE of ctid (a junk attr) which is used in executor. This is the process in old pgsql versions. However, in pgsql 9.1, while the above is kept. A new function in rewriter, that is rewriteTargetListUD() does the same thing. So for a plain UPDATE/DELTE command, it will have two ctid junk attr in its final TL. Is there any particular reason for this duplicated ctid?? This will not cause much problem, normally. However, in MERGE command, the TL of MERGE actions should contain no junk attributes. So, I add blocking codes in these two parts, to avoid the change of TL of MERGE actions. I don't know whether this will cause any problem. Regards, Yours Boxan merge204.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] a new problem in MERGE
Dear Greg, How are you? As we talked last week, there is a bug about the origslot field assertion. TRAP: FailedAssertion(!(epqstate-origslot != ((void *)0)), File: execMain.c, Line: 1732) I checked my codes and found the reason of this problem. In the original process of ExecModifyTable() function, there is a line of code EvalPlanQualSetSlot(node-mt_epqstate, planSlot); It is specially used to init the epqstate-origslot field with the tuple slot returned by the underlying plan. However, in our implementation of MERGE, this part is missed in ExecMerge() function. At the beginning, I just tried to form a new slot in ExecMerge() and use it as the epqstate-origslot for the merge actions' execution functions. This part is simple. But, during this process, I found a more serious problem: our MERGE currently does not support the use of system attributes in the action conditions and target lists. For example, if we raise queries like the following, there will pops some ERROR message. (table target has oid attribute) MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED AND* t.oid 10* THEN update SET val = t.val + s.add; or MERGE INTO target t USING source s ON t.id = s.id WHEN MATCHED AND* t.xmin = 1000* THEN update SET val = t.val + s.add; The reason for the above problem is the missing of system attribute in the result slot returned by the main plan of MERGE. In our implementation, the Main Pan of MERGE just generate a tuple slot contain all the direct vars from source table and target table. The expressions in MERGE action conditions and target lists are then calculated based on the result slot of main plan. Currently, all the system attributes (except the ctid of target table) are not included in main plan. So, the evaluation of action condition and target entries will fail if they involve some system attributes. If we want to solve this problem, we need to extend the target list of main plan before execution (in analyzer or planner), make sure it contains all the attributes appeared in any of the MERGE actions. This may lead to adjustments to many part of the process of MERGE. For example, within the planner, I created a function specially for mapping the Var nodes in actions to attributes in the result slot. This function should be rewritten for this new case. I have plan to fix the above two bugs together. (in fact, I have already started coding in merge_v202 edition). My question is how should I make my update be consistent with yours. Is it possible for you to give me an edition that I can work on? Thanks! Yours Boxuan
Re: [HACKERS] ask for review of MERGE
This did seem to find a bug in the implementation after running for a while: TRAP: FailedAssertion(!(epqstate-origslot != ((void *)0)), File: execMain.c, Line: 1732) Line number there is relative to what you can see at http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c and that's a null pointer check at the beginning of EvalPlanQualFetchRowMarks. Haven't explored why this happened or how repeatable this is, but since Boxuan was looking for some bugs to chase I wanted to deliver him one to chew on. I am not sure how this bug comes. I will try to find out the reason for it.
Re: [HACKERS] ask for review of MERGE
Hi, I considered the empty source table situation again. I think it is correct to upsert nothing in this case. Back to the original logic of MERGE command, it is main purpose is to add the supplementary data from the source table into the target table. Thus, an empty source table means no input data is available, so no upsert is needed in target table. Regards, Boxuan
Re: [HACKERS] ask for review of MERGE
On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas robertmh...@gmail.com wrote: I think that MERGE is supposed to trigger one rule for each row in the source data. So: On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith g...@2ndquadrant.com wrote: MERGE INTO Stock t USING (SELECT * FROM Stock WHERE item_id=10) AS s ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (10,1) ; This works fine, and updates the matching row: item_id | balance -+- 20 |1900 10 |2201 Here you have one row of source data, and you got one action (the WHEN MATCHED case). But if I give it a key that doesn't exist instead: MERGE INTO Stock t USING (SELECT * FROM Stock WHERE item_id=30) AS s ON s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (30,1) ; This doesn't execute the NOT MATCHED case and INSERT the way I expected it to. It just gives back MERGE 0. Here you have no rows of source data (the USING (SELECT ...) doesn't return anything, since no rows exist) so nothing happens. Yes. The MERGE process is based on a left join between the source table and target table. Since here the source table is empty, no join is carried, and thus no MERGE action is taken. But, is it correct logically? I mean, should we insert some rows in the above example rather than do nothing? Since I wasn't sure if the whole subquery in the USING clause case was really implemented fully, I then tried to do this with something more like the working regression test examples. I expected this to do the same thing as the first example: MERGE INTO Stock t USING Stock s ON s.item_id=10 AND s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (10,1) ; But it gives back this: ERROR: duplicate key value violates unique constraint stock_item_id_key DETAIL: Key (item_id)=(10) already exists. Here you have two rows of source data. The ON clause represents the join condition. The item_id=10 row matches - so you get an update, presumably, though we can't see that as things turn out - and the item_id=20 row doesn't match - so you try to insert (10, 1), which is a duplicate key, thus the error. Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of things to generate that. But it doesn't work any better if you give it an example that doesn't exist: MERGE INTO Stock t USING Stock s ON s.item_id=30 AND s.item_id=t.item_id WHEN MATCHED THEN UPDATE SET balance=s.balance + 1 WHEN NOT MATCHED THEN INSERT VALUES (30,1) ; ERROR: duplicate key value violates unique constraint stock_item_id_key DETAIL: Key (item_id)=(30) already exists. In this case neither row of the source data matches the join condition (s.item_id=30 might as well be constant FALSE as far as the test data is concerned) so you attempt to execute the NOT MATCHED side twice. So this one also looks correct to me. Yes, that is what happened in the above two examples. The other thing I noticed that may take some work to sort out is that I haven't had any luck getting MERGE to execute from within a plpgsql function. I was hoping I could use this to update the pgbench tables: Good catch. Considering the size of this patch, I have no problem leaving this to the eventual committer to fix, or to a subsequent commit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] ask for review of MERGE
On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith g...@2ndquadrant.com wrote: One compiler warning I noticed that needs to get resolved: src/backend/commands/explain.c: explain.c: In function ‘ExplainMergeActions’: explain.c:1528: warning: comparison of distinct pointer types lacks a cast That is complaining about this section: if (mt_planstate-operation != CMD_MERGE || mt_planstate-mt_mergeActPstates == NIL) return; So presumably that comparison with NIL needs a cast. Once I get more familiar with the code I'll fix that myself if Boxuan doesn't offer a suggestion first. Possibly NULL was meant instead of NIL. NIL is specifically for a List. Yes, it should be NULL instead of NIL. At first, I designed this filed as a List. But I changed it into an array in the last editions. This is why I have an unmatched assignment here. Sorry for that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Re: [HACKERS] gSoC add MERGE command new patch -- merge_v104
On Wed, Aug 25, 2010 at 4:56 AM, Andres Freund and...@anarazel.de wrote: On Tue, Aug 24, 2010 at 11:02:41PM +0300, Heikki Linnakangas wrote: On 24/08/10 16:35, Boxuan Zhai wrote: Hi, I finished the MERGE on inheritance tables. Now comes the merge_v201 Oh, great! That means that all the known issues are fixed now, and all that's left is fixing any issues raised in review. I've added this to the September commitfest, but I hope I'll find some time to look at this before that. I welcome anyone else to review this too! I have to ask one question: On a short review of the discussion and the patch I didn't find anything about the concurrency issues involved (at least nodeModifyTable.c didnt show any). Whats the plan to go forward at that subject? I think the patch needs to lock tables exclusively (the pg level, not access exclusive) as long as there is no additional handling... Thanks for the work Boxuan! The concurrency issues are not involved. I don't know much about this part. I think we need more discussion on it. Andres PS: The patch reintroduces some whitespace damage...
Re: [HACKERS] MERGE Specification
On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11/08/10 11:11, Boxuan Zhai wrote: The new patch is done. I named it as merge_v102. (1 means it is the non-inheritance merge command, 02 means this is the second time of fixing reported bugs) Thanks. I went through this, fixing all the spurious end-of-line whitespace first with git apply --whitespace=fix, and then manually fixing comment and whitespace style, typos, and doing some other small comment editing. Resulting patch attached. It is also available at my git repository at git://git.postgresql.org/git/users/heikki/postgres.git, branch 'mergestmt'. Please base any further patch versions on this patch, so that we don't need to redo this cleanup. Thanks for the cleanup. The codes looks better now. My problem is that I have done some more modifications after merge_v102. Is there any way to apply the cleanup patch without erasing my new codes? I'll continue reviewing this sometime next week, but here's few miscellaneous issues for starters; * Explain output of actions needs some work: Merge (cost=246.37..272.96 rows=1770 width=38) ACTION: UPDATE WHEN NOT MATCHED ACTION: INSERT WHEN NOT MATCHED - Merge Left Join (cost=246.37..272.96 rows=1770 width=38) Should print out more details of the action, like for normal updates/inserts/deletes. Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more details than what we have here except the costs, rows and width, which is print out at the MERGE command line. For example: Explain update buy set volume = volume + 1; QUERY PLAN -- Update (cost=0.00..36.75 rows=2140 width=14) - Seq Scan on buy (cost=0.00..36.75 rows=2140 width=14) (2 rows) For the explanation of an UPDATE command, we only have a Update title followed by the costs, then, after the arrow - there comes the plan tree. In a MERGE command, no action has its real private plan. They all share the main plan. Thus, the costs for a merge command is that of the main plan. What is useful for a merge action is only the target list and quals. So my design is to show the merge command costs in first line. Then print out the actions and their qualifications in a list, followed by the main plan tree. Is there any other thing you suggest to print out for each action? And all uppercase doesn't fit the surrounding style. This will be fixed. * Is the behavior now SQL-compliant? We had long discussions about the default action being insert or do nothing or raise error, but I never got a clear picture of what the SQL spec says and whether we're compliant. * What does the one tuple is error notice mean? We'll have to do something about that.. I don't think we've properly thought through the meaning of RAISE ERROR. Let's cut it out from the patch until then. It's only a few dozen lines to put back when we know what to do about it. I find that we have not reached an agreement on MERGE's syntax yet. Personally, I support Simon's idea, that is the default action should be RAISE ERROR. However, I am no sure what RAISE ERROR should do when we get a missing tuple. Here I just use a simple elog(NOTICE, a tuple is error); for this situation. I leave this for further extension when a more specific design for RAISE ERROR is available. Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do you want me to chage the default action back to DO NOTHING? Or any other suggetions? (In fact, my personal thinking is to add a non-omissible ELSE clause as the end of the action list which forces the user to specify the default action for merge). * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly? I need one flag in these statement to differentiate them from normal InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the process of these two kinds of statements in the transfromStmt() function. I define a set of alias nodes which have different nodetags. This can make the code simpler. * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and InsertStmt? Ah, I add this function because I want to have a look of the content of DeleteStmt. I did this long ago, just after I started the project. If you don't want this function, I will remove it. * I wonder if it would be simpler to check the fact that you can only have UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN NOT MATCHED action directly in the grammar? We can do this, but, I think it is better to keep it as it is now. * Regarding this speculation in the MergeStmt grammar rule: /*although we have only one USING table, we still make it a list, maybe in future we will allow multiple USING tables.*/ I wonder what
Re: [HACKERS] MERGE command for inheritance
On Fri, Aug 13, 2010 at 2:33 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 13/08/10 09:27, Boxuan Zhai wrote: I have renewed the merge.sql and merge.out in regress. Please have a look. Thanks. Did you change the way DO INSTEAD rules are handled already? http://archives.postgresql.org/pgsql-hackers/2010-08/msg00151.php Yes. This mistake has been corrected a few editions ago. Now, the actions replaced by INSTEAD rules will still catch tuples but do nothing for them. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] MERGE command for inheritance
On Thu, Aug 12, 2010 at 12:49 AM, Greg Smith g...@2ndquadrant.com wrote: Tom Lane wrote: Do we really think this is anywhere near committable now? There's a relatively objective standard for the first thing needed for commit--does it work?--in the form of the regression tests Simon put together before development. I just tried the latest merge_v102.patch (regression diff attached) to see how that's going. There are still a couple of errors in there. It looks to me like the error handling and related DO NOTHING support are the next pair of things that patch needs work on. I'd rather see that sorted out than to march onward to inheritance without the fundamentals even nailed down yet. Sorry for the mismatch problem of regress. In fact, I am still unable to make the regression test run on my machine. When I try the command pg_regreess in /src/test/regress/, it always gives a FATAL error: FATAL: parameter port cannot be changed without restarting the server psql: FATAL: parameter port cannot be changed without restarting the server command failed: C:/msys/1.0/local/pgsql/bin//psql -X -c DROP DATABASE IF EXISTS \regression\ postgres However, I can run this command directly in the MinGW command line interface. I guess this is because the psql_command() function has some check before accept commands. And the MinGW environment cannot pass these checks. All the SQL commands in the .sql file have been tested by hand. And they are all correct. However, the .out file is not automatic generated by pgsql. I may need to find a linux system to try to generate the correct .out file some other time. Or, would someone help me to generate an .out file through pg_regress? -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us http://www.2ndquadrant.us/ *** /home/postgres/pgwork/repo/git/postgresql/src/test/regress/expected/merge.out 2010-08-11 12:23:50.0 -0400 --- /home/postgres/pgwork/repo/git/postgresql/src/test/regress/results/merge.out 2010-08-11 12:33:27.0 -0400 *** *** 44,57 WHEN MATCHED THEN UPDATE SET balance = t.balance + s.balance ; ! SELECT * FROM target; ! id | balance ! +- ! 1 | 10 ! 2 | 25 ! 3 | 50 ! (3 rows) ! ROLLBACK; -- do a simple equivalent of an INSERT SELECT BEGIN; --- 44,50 WHEN MATCHED THEN UPDATE SET balance = t.balance + s.balance ; ! NOTICE: one tuple is ERROR ROLLBACK; -- do a simple equivalent of an INSERT SELECT BEGIN; *** *** 61,66 --- 54,61 WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; + NOTICE: one tuple is ERROR + NOTICE: one tuple is ERROR SELECT * FROM target; id | balance +- *** *** 102,107 --- 97,103 WHEN MATCHED THEN DELETE ; + NOTICE: one tuple is ERROR SELECT * FROM target; id | balance +- *** *** 165,176 ERROR: multiple actions on single target row ROLLBACK; ! -- This next SQL statement -- fails according to standard -- suceeds in PostgreSQL implementation by simply ignoring the second -- matching row since it activates no WHEN clause BEGIN; MERGE into target t USING (select * from source) AS s ON t.id = s.id --- 161,175 ERROR: multiple actions on single target row ROLLBACK; ! ERROR: syntax error at or near ERROR ! LINE 1: ERROR: multiple actions on single target row ! ^ -- This next SQL statement -- fails according to standard -- suceeds in PostgreSQL implementation by simply ignoring the second -- matching row since it activates no WHEN clause BEGIN; + ERROR: current transaction is aborted, commands ignored until end of transaction block MERGE into target t USING (select * from source) AS s ON t.id = s.id *** *** 179,184 --- 178,184 WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; + ERROR: current transaction is aborted, commands ignored until end of transaction block ROLLBACK; -- Now lets prepare the test data to generate 2 non-matching rows DELETE FROM source WHERE id = 3 AND balance = 5; *** *** 188,195 +- 2 | 5 3 | 20 - 4 | 5 4 | 40 (4 rows) -- This next SQL statement --- 188,195 +- 2 | 5 3 | 20 4 | 40 + 4 | 5 (4 rows) -- This next SQL statement *** *** 203,216 WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; SELECT * FROM target; id | balance +- 1 | 10 2 | 20 3 | 30 - 4 | 5 4 | 40 (5 rows) ROLLBACK; --- 203,218 WHEN NOT MATCHED THEN INSERT VALUES (s.id, s.balance) ; + NOTICE: one tuple is ERROR + NOTICE:
Re: [HACKERS] MERGE Specification
On Wed, Aug 11, 2010 at 12:18 PM, Boxuan Zhai bxzhai2...@gmail.com wrote: On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith g...@2ndquadrant.comwrote: Boxuan Zhai wrote: I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere? When you run configure before make, use --enable-cassert. The normal trio for working on the PostgreSQL code is: ./configure --enable-depend --enable-cassert --enable-debug Generally the only reason to build as a developer without asserts on is to do performance testing. They will slow some portions of the code down significantly. Thanks. I will test MERGE under this new configuration. A new patch will be submitted once I fix all the asserting bugs. The new patch is done. I named it as merge_v102. (1 means it is the non-inheritance merge command, 02 means this is the second time of fixing reported bugs) -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us http://www.2ndquadrant.us/ merge_v102.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] MERGE command for inheritance
On Wed, Aug 11, 2010 at 4:27 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 10/08/10 12:38, Boxuan Zhai wrote: These days I am considering what else can be done for MERGE, And, I find inheritance tables in postgres is not supported by our MERGE command yet. I played with your latest patch version a bit, and actually, it seems to me that inherited tables work just fine. I ran into the assertion failures earlier while trying that, but that has now been fixed. Can you give an example of the kind of query that's not working yet? Well, in the patch I submitted, the target relation is forced not to scan any inheritance tables. That is, the command always acts like MERGE into *ONLY* foo USING bar So, the inheritance in current MERGE should not work, I think. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] MERGE command for inheritance
On Wed, Aug 11, 2010 at 4:45 PM, Simon Riggs si...@2ndquadrant.com wrote: On Tue, 2010-08-10 at 17:15 +0300, Heikki Linnakangas wrote: On 10/08/10 12:38, Boxuan Zhai wrote: The difficult way is to generate the plans for children table in planner, as the other commands like UPDATE and DELETE. However, because the structure of MERGE plan is much more complex than the ordinary ModifyTable plans, this job may not as simple as we expected. We need to adjust both the main plan and the merge actions to fit the children tables, which is not straight forward. This the approach you'll have to take. But actually, I'm surprised it doesn't happen to just work already. It should be opaque to the merge facility that the reference to the parent target table has inherited child tables - expanding the inherited table to scans of all the children should already be handled by the planner. The support for UPDATE and SELECT of partitioned cases is very different in the planner and was handled as separate implementation projects. If we want a working MERGE in the next release, I suggest that we break down this project in the same way and look at partitioned target tables as a separate project. One reason for suggesting this is that all MERGE statements have a source table, whereas UPDATE and DELETEs did not always. The plan for a simple UPDATE and DELETE against a partitioned table is simple, but the plan (and performance) of a joined UPDATE or DELETE is not good: postgres=# explain update p set col2 = x.col2 from x where x.col1 = p.col1; QUERY PLAN --- Update (cost=299.56..1961.18 rows=68694 width=20) - Merge Join (cost=299.56..653.73 rows=22898 width=20) Merge Cond: (public.p.col1 = x.col1) - Sort (cost=149.78..155.13 rows=2140 width=10) Sort Key: public.p.col1 - Seq Scan on p (cost=0.00..31.40 rows=2140 width=10) - Sort (cost=149.78..155.13 rows=2140 width=14) Sort Key: x.col1 - Seq Scan on x (cost=0.00..31.40 rows=2140 width=14) - Merge Join (cost=299.56..653.73 rows=22898 width=20) Merge Cond: (public.p.col1 = x.col1) - Sort (cost=149.78..155.13 rows=2140 width=10) Sort Key: public.p.col1 - Seq Scan on p1 p (cost=0.00..31.40 rows=2140 width=10) - Sort (cost=149.78..155.13 rows=2140 width=14) Sort Key: x.col1 - Seq Scan on x (cost=0.00..31.40 rows=2140 width=14) - Merge Join (cost=299.56..653.73 rows=22898 width=20) Merge Cond: (public.p.col1 = x.col1) - Sort (cost=149.78..155.13 rows=2140 width=10) Sort Key: public.p.col1 - Seq Scan on p2 p (cost=0.00..31.40 rows=2140 width=10) - Sort (cost=149.78..155.13 rows=2140 width=14) Sort Key: x.col1 - Seq Scan on x (cost=0.00..31.40 rows=2140 width=14) Those plans could use some love and attention before forcing Boxuan to implement that. It seems that we have not decided whether to put the inheritance for MERGE off for a latter implementation. But, I think we can discuss how to do it now. First of all, the inheritance of MERGE should not be implemented in the rule-like way. I agree that the easy way I proposed is not consistent with the general inheritance process in postgres. The normal way of doing this is to handle it in planner, to be more specific, we need to extend the function inheritance_planner() for processing MERGE queries. For UPDATE and DELETE commands (INSERT is not an inheritable command), if inheritance_planner finds that the target table has children tables, it will generate a list of queries. These queries are almost the same as the original query input by user, except for the different target relations. Each child table has it corresponding query in this list. This list of queries will then be processed by grouping_planner() and transformed into a list of plans. One most important work finished in this function is to extend the target list of target relations to make sure that all attributes of a target relation appears in the final result tuple of its plan. As for MERGE command, we need to do the same thing. But, since the main query body is a LEFT JOIN query between source table and target table, the top-level target list is a combination of all the attributes from source table and target table. Thus, when we extend the target list, we should only extent the part of target relations, and keep the source table part untouched. Once a main query in this style has been transformed to plan, we need to prepare the merge actions for it too. That is, extend the target list of all UPDATE and INSERT actions for the corresponding target relation. In this way, each target relation will have its own
[HACKERS] MERGE command for inheritance
Hi, These days I am considering what else can be done for MERGE, And, I find inheritance tables in postgres is not supported by our MERGE command yet. Currently, MERGE command is only able to handle the target table itself, and its children tables are not involved in the process. I am not sure if inheritance of MERGE is needed by postgres. If we need, I may propose two methods for implementing it. An easy way to do it is use a rule-like strategy. We can generate new MERGE query statements with the children table as their target tables. Then the new query statement will be planned and executed in the normal way. This process can be put in the rewriter, before the queries are planned. This method is quite easy but seems not follow the tradition of inheritance in Postgres. The difficult way is to generate the plans for children table in planner, as the other commands like UPDATE and DELETE. However, because the structure of MERGE plan is much more complex than the ordinary ModifyTable plans, this job may not as simple as we expected. We need to adjust both the main plan and the merge actions to fit the children tables, which is not straight forward. I would like to know your opinions on this problem. PS: for my investigation on the inheritance actions, I find that although the children tables are modified by the UPDATE or DELETE commands on their ancestor tables, the rules defined on them are not activated during the query. Is this the case (I hope I am not asking a stupid question)? And, if so, I may ask why we want it to act like this. Boxuan
Re: [HACKERS] MERGE Specification
On Tue, Aug 10, 2010 at 10:29 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 10/08/10 12:08, Boxuan Zhai wrote: Thanks for your feedback. I fixed all the above waring bugs. Find the new patch in attachement. Thanks. I'm getting an assertion failure with this statement: CREATE TABLE foo (id int4); MERGE into foo t USING (select id FROM generate_series(1,5) id) AS s ON t.id = s.id WHEN NOT MATCHED THEN INSERT (id) VALUES (s.id); The query works on my machine. TRAP: FailedAssertion(!(ActiveSnapshotSet()), File: postgres.c, Line: 749) That's easily fixed - you need to add case T_MergeStmt to the list of optimizable command types in analyze_requires_snapshot() function. Unfortunately that doesn't get you far, the query then trips another assertion: TRAP: FailedAssertion(!(list_length(resultRelations) == list_length(subplans)), File: createplan.c, Line: 3929) I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] MERGE Specification
On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith g...@2ndquadrant.com wrote: Boxuan Zhai wrote: I just found that no Assert() works in my codes. I think it is because the assertion is no enabled. How to enable assertion. To define USE_ASSERT_CHECKING somewhere? When you run configure before make, use --enable-cassert. The normal trio for working on the PostgreSQL code is: ./configure --enable-depend --enable-cassert --enable-debug Generally the only reason to build as a developer without asserts on is to do performance testing. They will slow some portions of the code down significantly. Thanks. I will test MERGE under this new configuration. A new patch will be submitted once I fix all the asserting bugs. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us http://www.2ndquadrant.us/
Re: [HACKERS] MERGE Specification
On Fri, Aug 6, 2010 at 3:41 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote: SQL:2011 makes no mention of how MERGE should react to statement level triggers. MERGE is not a trigger action even. Given considerable confusion in this area, IMHO we should just say the MERGE does not call statement triggers at all, of any kind. IMO the UPDATE/DELETE/INSERT actions should fire the respective statement level triggers, but the MERGE itself should not. When, and how? If an UPDATE is mentioned 5 times, do we call the trigger 5 times? My current process for BEFOR / AFTER STATEMENT trigger on MERGE is to fire the triggers for all action types that appears in the command, unless it is replaced by a INSTEAD rule. But the triggers for one action type will be fired only once. That means you will get both UPDATE and INSERT triggers be activated for only once if you are executing a MERGE command with 5 UPDATEs and 10 INSERTs. What happens if none of the UPDATEs are ever executed? The triggers (I mean triggers for statement) will be fired anyway even the UPDATE action matches no tuple. This is not for MERGE only. If you update a table with the command UPDATE foo SET ... WHERE false; It will also fire the STATEMENT triggers of UPDATE type on foo (I think so). And, even not been asked, I want to say that, in current implementation of MERGE, the row level triggers are fired by the actions that take the tuples. If one tuple is caught by an UPDATE action, then the UPDATE row trigger will be fired on this tuple. If it is handled by INSERT action, then the INSRET row triggers are on. Hope you agree with my designs. Best explain exactly what you mean. -- Simon Riggs www.2ndQuadrant.com http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] MERGE Specification
On Fri, Aug 6, 2010 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote: On 06/08/10 10:12, Simon Riggs wrote: So DO NOTHING is the default and implies silently ignoring rows. RAISE ERROR is the opposite. Coding for those seems very easy, its just a question of should we do it?. DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2 introduction of AND clauses, and SQL:2011 has so far followed the DB2 introduction of DELETE action also. I see neither DO NOTHING or RAISE ERROR in the documentation of DB2, Oracle, or MSSQL server. Agreed, Oracle and MSSQL server does not have these. However, DB2 very clearly does have these features * SIGNAL which raises an error and can be used in place of any action, at any point in sequence of WHEN clauses. DB2 already supports SIGNAL as part of SQL/PSM, which we do not, so RAISE ERROR was the nearest equivalent command for PostgreSQL. * ELSE IGNORE which does same thing as DO NOTHING, except it must always be last statement in a sequence of WHEN clauses. DO NOTHING is already a phrase with exactly this meaning in PostgreSQL, so I suggest that. So, we need to add both DO NOTHING and RAISE ERROR actions in the MERGE command now !? What will RAISE ERROR do? To stop the whole MERGE command OR, just throw an error notice for the row and move on. -- Simon Riggs www.2ndQuadrant.com http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] MERGE Specification
On Thu, Aug 5, 2010 at 7:25 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-08-05 at 12:29 +0300, Heikki Linnakangas wrote: On 05/08/10 10:46, Simon Riggs wrote: On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote: The following two files specify the behaviour of the MERGE statement and how it will work in the world of PostgreSQL. The HTML file was generated from SGML source, though the latter is not included here for clarity. Enclose merge.sgml docs for forthcoming MERGE command, as originally written. Oh, cool, I wasn't aware you had written that already. Boxuan, please include this in your patch, after reviewing and removing/editing anything that doesn't apply to your patch. Thanks a lot for the instruction file of MERGE command. I have read through it carefully. It is really a great work. I have to admit that I am not familiar with the sgml language, and I cannot write the instruction by myself. All features of MERGE demonstrated in this file are consistent with my implementation, EXCEPT the DO NOTHING option. In current edition, we don't have the DO NOTHING action type. That is, during the execution of MERGE commands, if one tuple is not caught by any of the merge actions, it will be ignored. In another word, DO NOTING (although cannot be specified explicitly by user) is the DEFAULT action for tuples. In the contrary, Simon's instruction says that the DEFAULT action for the tuple caught by no actions is WHEN NOT MATCHED THEN INSERT DEFAULT VALUES From the user's point of view, these two kinds of MERGE command may have not much differences. But, as the coder, I prefer current setting, because we can save the implementation for a new type of MERGE actions (DO NOTHING is a special merge action type). And, thus, no checks and special process for it. (For example, we need to make sure that DO NOTHING is the last WHEN clause, and it has no additional qual. And we have to generate a INSERT DEFAULT VALUES action for the MERGE command if we don't find the DO NOTHING action) Well, if people want the DO NOTHING action, I will add it in the system. Now, I have changed the RULE strategy of MERGE to the better logic. And I am working on triggers for MERGE, which is also mentioned in the instruction file. I will build a new patch with no long comment and blank line around functions, and possibly contain the regress test file and this sgml instructions in it. I wish we can reach a agreement on the DO NOTHING thing before my next submission, so I can make necessary modification on my code for it. (the new patch may be finished in one or two days, I think) Thanks! PS: I have an embarrassing question: how to view the sgml instructions of postgres in web page form, rather than read the source code of them? Also had these fragments as well, if they're still useful. Probably just useful as pointers as to what else to change to include the docs. The tests and docs were written from SQL standard, so any deviations would need to be flagged. The idea of writing the tests first was that they provide an objective test of whether the implementation works according to spec. I'd quite like a commentary on anything that needs changing. Not saying I will necessarily object to differences, but knowing the differences sounds important for us. -- Simon Riggs www.2ndQuadrant.com http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] MERGE Specification
Dear All, I have seen a lively discussion about the DO NOTING action in MERGE command. And, I think most people want it. So it will be added to my next patch. Before the implementation, I still have some questions to confirm: 1. If we have a DO NOTHING action specified, it should be the last WHEN clause. It must be of the NOT MATCHED cases, and it CANNOT have any additional action qualifications. Am I correct? 2. If no DO NOTHING specified, we will imply a INSERT DEFAULT VALUES action as the end of MERGE. My question is, is this action taken only for the NOT MATCHED tuples? If this is the case, then what about the MATCHED tuples that match not previous actions? Ignore them? That means we are in fact going to add two implicit WHEN clause: a) WHEN NOT MATCHED INSERT default values; b) WHEN MATCHED THEN DO NOTHING. OR, is the INSERT DEFAULT VALUES applied to ALL tuples not matter they are MATCHED or not? Besides, (I mean no offense, but) can this method really avoid losing row? So far as I know, the DEFAULT values for table attributes are defined when the table is created and no variables are allowed in the default value expressions. That means, they are usually constants or simple serial numbers. Image that we have a MERGE command that has thousands of NOT MATCHED tuples going to the implicit action. Then, the target table will inserted with thousands of rows with DEAULT VALUES. These row will have similar (if not exactly the same) simple content, which contains NO information from the source table of MERGE. Is this really what we want? If it is not, then what is the use of the INSERT DEFAULT VALUES action? Regards
Re: [HACKERS] merge command - GSoC progress
Dear Robert, I am just considering that there may be some logical mistakes for my rule rewriting strategy of MERGE actions. In my current design, if we find that an action type, say UPDATE, is replaced by INSTEAD rules, we will remove all the actions of this type from the MERGE command, as if they are not be specified by user from the beginning. See the test example in my pages for this situation. https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules Now,I am thinking that maybe we should keep the replaced actions in action list, and just mark them to be invalid. If one join tuple from the main plan fits the condition of this action, we will do nothing on it. This strategy is a little bit different with the current one. If we delete an action, the tuples that meet it condition will be caught by other actions. If we keep it, the tuples that match it will be skipped. I think the new design is more logical, and I wonder your opinion on this problem. Yours Boxuan
Re: [HACKERS] merge command - GSoC progress
2010/8/4 Greg Smith g...@2ndquadrant.com Boxuan Zhai wrote: I think there are no redundant lines in this time's patch file. It is much better. There are still more blank likes around the new code you've added than are needed in many places, but that doesn't interfere with reading the patch. Sorry, it is my personal habit of leaving blanks around codes. I will chage this if it doesn't follow the pgsql coding style. The main code formatting issue left you'll need to address eventually are all the really long comments in there. I will correct the long comments in the next patch. And, I have tested the running of MERGE command with different situations. I am sorry that I didn't create regression test files, because I am not sure how to add new files in the git package. git add filename ? The tests you've put in there are the right general sort of things to try out. The one example you gave does show an UPSERT being emulated by MERGE, which is the #1 thing people are looking for initially. In fact, I have created a merge.sql with simple merge example. I put it in the folder of /src/test/regress/sql/ and modified the serial_schedule file to add a line of : test: merge Is this correct? But, I don't know how to run regress to test this sql file. My make check fails when install the db. I think this is because I do it under a MinGW environment and some parameters are not matched with the default setting of postgres. I can configure, make install , initdb and do sql query in psql successfully in my machine. So the source code itself should be correct. I put my merge.sql in attachment, in case anyone want to have a look. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us http://www.2ndquadrant.us/ merge.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] merge command - GSoC progress
2010/7/31 Greg Smith g...@2ndquadrant.com Boxuan Zhai wrote: I create a clean patch file (no debug clauses). See the attachment. Some general coding feedback for you on this. Thanks for your consideration! It's not obvious to people who might want to try this out what exactly they are supposed to do. Particularly for complicated patches like this, where only a subset of the final feature might actually be implemented, some sort of reviewer guide suggesting what should and shouldn't work would be extremely helpful. I recall there was some sort of patch design guide in an earlier version of this patch; it doesn't seem to be there anymore. Don't remember if that had any examples in it. I am now working on fixing a bug which makes the system unable to initdb. I will update my page in postgres Wiki with a detailed instruction of my implementation and testing examples soon, with my next patch file. Ultimately, the examples of working behavior for this patch will need to be put into code. The way that happens is by adding working examples into the regression tests for the database. If you had those done for this patch, I wouldn't have to ask for code examples; I could just look at the source to the regression output and see how to use it against the standard database the regression samples create, and then execute against. This at least lets you avoid having to generate some test data, because there will already be some in the regression database for you to use. There is an intro this topic at http://wiki.postgresql.org/wiki/Regression_test_authoring Another common way to generate test data is to run pgbench which creates 4 tables and populates them with data. I will try to add my testing examples to the gregression folder. As far as the patch itself goes, you have some work left on cleaning that up still you'll need to do eventually. What I would suggest is actually reading the patch itself; don't just generate it and send it, read through the whole thing like someone new to it would do. One way you can improve what you've done already is to find places where you have introduced changes to the code structure just through editing. Here's an example of what I'm talking about, from line 499 of your patch: -break; +break; This happened because you added two invisible tabs to the end of this line. This makes the patch larger for no good reason and tends to infuriate people who work on the code. There's quite a bit of extra lines added in here too that should go. You should consider reducing the ultimate size of the patch in terms of lines a worthwhile use of your time, even if it doesn't change how things work. There's lots of examples in this one where you put two or three lines between two statements when a single one would match the look of the code in that file. A round of reading the diff looking for that sort of problem would be useful. Another thing you should do is limit how long each line is when practical. You have lots of seriously wide comment lines here right now in particular. While there are some longer lines in the PostgreSQL code right now, that's code, not comments. And when you have a long line and a long comment, don't tack the comment onto the end. Put it on the line above instead. Also, don't use // in the middle of comments the way you've done in a few places here. Sorry for these mistakes, again. I promise that the same thing will not happen in my next patch. Getting some examples sorted out and starting on regression tests is more important than the coding style parts I think, just wanted to pass those along while I noticed them reading the patch, so you could start looking out for them more as you continue to work on it. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.us http://www.2ndquadrant.us/
Re: [HACKERS] merge command - GSoC progress
2010/7/28 Robert Haas robertmh...@gmail.com On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai bxzhai2...@gmail.com wrote: I have get a edition that the merge command can run. It accept the standard merge command and can do UPDATE, INSERT and DELETE actions now. But we cannot put additional qualification for actions. There are some bugs when we try to evaluate the quals which make the system quit. I will fix it soon. This patch doesn't compile. You're using zbxprint() from a bunch of places where it's not defined. I get compile warnings for all of those files and then a link failure at the end. You might find it useful to create src/Makefile.custom in your local tree and put COPT=-Werror in there; it tends to prevent problems of this kind. Undefined symbols: _zbxprint, referenced from: _transformStmt in analyze.o _ExecInitMergeAction in nodeModifyTable.o _ExecModifyTable in nodeModifyTable.o _ExecInitModifyTable in nodeModifyTable.o _merge_action_planner in planner.o Sorry, this is a debug function defined by me. It may not be included in the patch. I add a line of #define zbxprint printf somewhere in the system. Not that it's as high-priority as getting this fully working, but you should revert the useless changes in this patch - e.g. the one-line change to heaptuple.c is obvious debugging leftovers, and all of the changes to execQual.c and execUtil.c are whitespace-only. You should also try to make your code and comments conform to project style guidelines. In general, you'll find it easier to keep track of your changes (and you'll have fewer spurious changes) if you use git diff master...yourbranch instead of marking comments, etc. with ZBX or similar. I will clean all these in my next patch. I am now very confused with the failure of action qualification. I look through the whole process of a query, from parser to executor. In my opinion, the qualification transformed in analyzer, will be processed by prepsocess_qual_condition() in planner, and then by the ExecInitExpr() function in excutor start phase (in InitPlan() function). Then the qual is ready to be used in ExecQual(). Am I correct? I have done these on the merge action qual, but when I pass them into ExecQual(), the server just closed abnormally. I don't know if I missed any steps on preparing the qual expressions. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Fwd: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
-- Forwarded message -- From: Boxuan Zhai bxzhai2...@gmail.com Date: 2010/7/17 Subject: Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission To: Simon Riggs si...@2ndquadrant.com 2010/7/17 Simon Riggs si...@2ndquadrant.com On Fri, 2010-07-16 at 08:26 +0800, Boxuan Zhai wrote: The merge actions are transformed into lower level queries. I create a Query node for each of them and append them in a newly create List field mergeActQry. The action queries have different command type and specific target list and qual list, according to their declaration by user. But they all share the same range table. This is because we don't need the action queries to be planned latter. The joining strategy is decided by the top query. We are only interest in their specific action qualifications. In other words, these action queries are only containers for their target list and qualifications. 2. When the query is ready, it will be send to rewriter. In this part, we can call RewriteQuery() to handle the action queries. The UPDATE action will trigger rules on UPDATE, and so on. What need to be noticed are: 1. the actions of the same type should not be rewritten repeatedly. If there are two UPDATE actions in merge command, we should not trigger the ON UPDATE rules twice. 2. if an action type is fully replaced by rules, we should remove all actions of this type from the action list. Rewriter will also do some process on the target list of each action. IMHO it is a bad thing that we are attempting to execute each action statement as a query. That means we need to execute an inner SQL statement for each row returned by the top level query. That design makes MERGE similar in performance to an upsert PL/pgsql function, which will perform terribly on large numbers of rows. Dear Simmon, Thanks for your feedback. I may not present my idea clearly. In my design, the merge actions are not executed as separate queries. Only the top level query (that is a query like source table LEFT JOIN target_table ON matching_qual ) will be planned and executed. For each tuple return by this plan, we will choose a proper action for it and do the corresponding modification. The tables will only be scanned and joined once. One merge action will not do a full run of tables join and then modify table as a standard UPDATE/DELETE/INSERT query. (Is this what you are worried about?) In fact, for one action, we only need the information of: 1. the action type (UPDATE or DELTE or INSERT). 2 the target list. and 3. the additional qualifications. And a Query node is a perfect container for these infor. That's why I transform them in to Query nodes. But all through the analyzer, rewriter, planner and executor. I just call related functions to formalize the expressions in their target list and qual lists. The range table and join tree is only dermined by the top level query, they will not be effected by merge actions. This was exactly the point where I stopped implementation previously: attempting to make MERGE work with rules is enough to prevent a tighter in-executor implementation of the action list. I am sorry that I don't catch your meanning here clearly. As my understanding, if there is a rule on the target table, the rewriter will add a new query in the execution queue. (or replace the original query). I think the rule queries will not effect the process within the original query, because they are totally separate queries which will be run before or after the original query. Are you suggest that we should not allow rules on MERGE command? [To Boxuan, on a personal note, you seem to be coping quite well with the code and the process; congratulations and keep going.] Thank you. Your encouragement is very important to me. -- Simon Riggs www.2ndQuadrant.com http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
Hi, For the EXPLAIN MERGE command, I expect it to return a result similar to that of a SELECT command. I think the EXPLAIN command is to show how the tables in a query is scaned and joined. In my design, the merge command will generate a top-level query (and plan) as the main query. It is in fact a left join select query over the source and target tables. This main query (plan) decides how the tables are scanned. The merge actions will not effect this process. So when we explain the merge command, a similar result will be returned. For example the command EXPLAIN MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol; WHEN . Will return a result just like that of the following command: EXPLAIN SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id; Yours Boxuan. 2010/7/16 Heikki Linnakangas heikki.linnakan...@enterprisedb.com On 16/07/10 03:26, Boxuan Zhai wrote: PS: Heikki asked me about what the EXPLAIN MERGE ... command will do. Well, I have not test it, but it may through an error or just explain the top plan, since I put the action plans in a new field, which cannot be recognized by old functions. I meant what EXPLAIN MERGE output will look like after the project is finished, not what it will do at this stage. I was trying to get a picture of how you're thinking to implement the executor, what nodes there is in a MERGE plan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
Dear Hackers I considered my situation. And I found that I didn't communicate well with you, as makes you have little confidence on my project. Most of the time I just work by myself and not report to you frequently. I always want to finish a solid stage progress before do a submission. This may be a bad habit in the remote project. In fact, I have a detailed design on how to implement the command and I am working hard these days to catch the schedule. In my design, 1. the merge command is firstly transformed to a MergeStmt node in parser. And analyzer will generate a left out join query as the top query (or main query). This query is similar to a SELECT command query, but I set target relation in it. The top query will drive the scanning and joining over target and source tables. The merge actions are transformed into lower level queries. I create a Query node for each of them and append them in a newly create List field mergeActQry. The action queries have different command type and specific target list and qual list, according to their declaration by user. But they all share the same range table. This is because we don't need the action queries to be planned latter. The joining strategy is decided by the top query. We are only interest in their specific action qualifications. In other words, these action queries are only containers for their target list and qualifications. 2. When the query is ready, it will be send to rewriter. In this part, we can call RewriteQuery() to handle the action queries. The UPDATE action will trigger rules on UPDATE, and so on. What need to be noticed are: 1. the actions of the same type should not be rewritten repeatedly. If there are two UPDATE actions in merge command, we should not trigger the ON UPDATE rules twice. 2. if an action type is fully replaced by rules, we should remove all actions of this type from the action list. Rewriter will also do some process on the target list of each action. The first submission has finished the above part. 3. In planner, the top level query is handled in a normal way. Since it has almost the same structure as a SELECT query, the planner() function can work on it straight forward. However, we need a small change here. The merge command has a target relation, which need a ctid junk attribute in the target list. The ctid is required by the UPDATE and DELETE actions. Besides, for each of the action queries, we also need to create a Plan node. We don't need to do a full plan on the action queries. The crucial point is to preprocess the target list and qualification of each action. (Explanation for this point. The execution of a merge action is composed by two parts. The top plan will be executed in the main loop, and return the joined tuples one by one. And a action will apply its qualification on the returned tuples. If succeed, it will take the action and do corresponding modification on the target table. Thus, even we have a Plan node created for each action, we don't want to throw it directly into Planner() function. That will generate a new plan over the tables in Range Table, which is very probably different with the top-level plan. If we run the action plans directly, they will be confilict with each other). I create a function merge_action_planner() to do this job. This part is added at the end of standard_planner(). After that, all the plans of merge actions are linked into a new List filed in PlannedStmt result of the top plan. 4. When planner is finished, the plan will be send to executor through PortalRun(). As a new command, merge will chose the PORTAL_MULTI_QUERY strategy, and be sent to ProcessQuery() function. 5. As in the ExecutorStart() part, we need to set junkfilter for merge command, since we have a ctid junk attr in target list. And, the merge action plans should also be initialized and transformed into PlanState nodes. However, the initialization over action plan is only focus on the target list and quals. We don't need other part of traditional plan initialization, since these action plans are not for scanning or joining (this is the job of top plan). We only want to transform the action information into standard format that can be used by qualification evaluator in executor. I HAVE DONE ALL THE ABOVE IN A SECOND SUBMISSION. 6. In ExecutorRun() part, the top plan will be passed into ExecutePlan(). The action planstates can be found in the estate-es_plannedstmt field. The top plan can return tuples of the left out join on source table and target table. (I can see the tuple be returned in my codes). Thus, the design is correct. At least the top plan can do its work well. In the junkfilter, if we can find a non-null ctid, it is a matched tuple, or else, it is a NOT MATCHED tuple. Then we need to evaluate the additional quals of the actions one by one. If the evaluations of one action succeed, we will take this action and skip the remaining ones. Since the target list and qual expressions are all
Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
Hi, Thanks for all these feedback. I found that people have problems on running my codes, which probably comes from my nonstandard submission format. I can compile, install and initialize postgres in my own machine. The system accepts MERGE command and will throw an error when it runs into the executor, which cannot recognize the MERGE command type so far. I will make a standard patch as soon as possible. Sorry for the troubles. Yours Boxuan 2010/7/11 Tom Lane t...@sss.pgh.pa.us Peter Eisentraut pete...@gmx.net writes: On lör, 2010-07-10 at 12:45 -0400, Tom Lane wrote: I believe the project standard is to make things readable in an 80-column window --- anyone have an objection to stating that explicitly? Is that what pgindent reformats it to? pgindent tries to leave a character or two to spare, IIRC, so its target is probably 78 or thereabouts. regards, tom lane