[HACKERS] new patch of MERGE (merge_204) a question about duplicated ctid

2010-12-04 Thread Boxuan Zhai
Dear Greg,

I have updated the MERGE patch for two main problems.

1. Support of system attributes in MERGE action expressions.
In old version of MERGE, I use the top join as the only RTE in var name
space, during the transformation process in parser. It contains all the
common attributes of source and target table but not the system attributes,
such as oid.
Thus, if user specified system attributes in MERGE actions, error pops.
In current version, the var name space is forced to be the target table RTE
and source table RTE. So system attributes will be correctly transformed.

2. fix the problem of dropped column.
In planner, we need to map the Var nodes in MERGE actions to its
corresponding TE in the main plan's TL. I used to do this under the
assumption that all the TE are ordered by their attribute no (the value of
filed Var.varattno). However, this is not the case when the table contain
dropped attributes. The dropped attribute will take one attribute number but
not appear in TL.
 Now a new mapping algorithm is forged, which can avoid this bug.

Please help me to review this new patch. Thank you!

PS:

In the new pgsql 9.1, I find that the junk attribute ctid is added twice,
for UPDATE and DELETE commands.

In planner, the function preprocess_targetlist() will invoke a sub
function expand_targetlist() which will add missed TE for INSERT and UPDATE
commands. And for UPDATE and DELETE, it will also create a TE of ctid (a
junk attr) which is used in executor. This is the process in old pgsql
versions.

However, in pgsql 9.1, while the above is kept. A new function in rewriter,
that is rewriteTargetListUD() does the same thing. So for a plain
UPDATE/DELTE command, it will have two ctid junk attr in its final TL.

Is there any particular reason for this duplicated ctid??

This will not cause much problem, normally. However, in MERGE command, the
TL of MERGE actions should contain no junk attributes. So, I add blocking
codes in these two parts, to avoid the change of TL of MERGE actions. I
don't know whether this will cause any problem.

Regards,
Yours Boxan


merge204.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] a new problem in MERGE

2010-10-31 Thread Boxuan Zhai
Dear Greg,

How are you?

As we talked last week, there is a bug about the origslot field
assertion.

TRAP: FailedAssertion(!(epqstate-origslot != ((void *)0)), File:
execMain.c, Line: 1732)


I checked my codes and found the reason of this problem. In the original
process of ExecModifyTable() function, there is a line of code
EvalPlanQualSetSlot(node-mt_epqstate,
planSlot);

It is specially used to init the epqstate-origslot field with the tuple
slot returned by the underlying plan. However, in our implementation of
MERGE, this part is missed in ExecMerge() function.

At the beginning, I just tried to form a new slot in ExecMerge() and use
it as the epqstate-origslot for the merge actions' execution functions.
This part is simple.

But, during this process, I found a more serious problem: our MERGE
currently does not support the use of system attributes in the action
conditions and target lists.

For example, if we raise queries like the following, there will pops some
ERROR message.

(table target has oid attribute)
MERGE INTO target t USING source  s ON t.id = s.id
WHEN MATCHED AND* t.oid 10* THEN update SET val = t.val + s.add;

or

MERGE INTO target t USING source  s ON t.id = s.id
WHEN MATCHED AND* t.xmin = 1000* THEN update SET val = t.val + s.add;



The reason for the above problem is the missing of system attribute in the
result slot returned by the main plan of MERGE.

In our implementation,  the Main Pan of MERGE just generate a tuple slot
contain all the direct vars from source table and target table. The
expressions in MERGE action conditions and target lists are then calculated
based on the result slot of main plan.

Currently, all the system attributes (except the ctid of target table) are
not included in main plan. So, the evaluation of action condition and target
entries will fail if they involve some system attributes.

If we want to solve this problem, we need to extend the target list of main
plan before execution (in analyzer or planner), make sure it contains all
the attributes appeared in any of the MERGE actions. This may lead to
adjustments to many part of the process of MERGE. For example, within the
planner, I created a function specially for mapping the Var nodes in actions
to attributes in the result slot. This function should be rewritten for this
new case.

I have plan to fix the above two bugs together. (in fact, I have already
started coding in merge_v202 edition). My question is how should I make my
update be consistent with yours. Is it possible for you to give me an
edition that I can work on?

Thanks!

Yours Boxuan


Re: [HACKERS] ask for review of MERGE

2010-10-23 Thread Boxuan Zhai


 This did seem to find a bug in the implementation after running for a
 while:

 TRAP: FailedAssertion(!(epqstate-origslot != ((void *)0)), File:
 execMain.c, Line: 1732)

 Line number there is relative to what you can see at
 http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
 and that's a null pointer check at the beginning of
 EvalPlanQualFetchRowMarks.  Haven't explored why this happened or how
 repeatable this is, but since Boxuan was looking for some bugs to chase I
 wanted to deliver him one to chew on.


I am not sure how this bug comes. I will try to find out the reason for it.


Re: [HACKERS] ask for review of MERGE

2010-10-19 Thread Boxuan Zhai
Hi,

I considered the empty source table situation again. I think it is correct
to upsert nothing in this case.

Back to the original logic of MERGE command, it is main purpose is to add
the supplementary data from the source table into the target table. Thus, an
empty source table means no input data is available, so no upsert is needed
in target table.


Regards,
Boxuan


Re: [HACKERS] ask for review of MERGE

2010-10-18 Thread Boxuan Zhai
On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas robertmh...@gmail.com wrote:

 I think that MERGE is supposed to trigger one rule for each row in the
 source data.  So:

 On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith g...@2ndquadrant.com wrote:
  MERGE INTO Stock t
   USING (SELECT * FROM Stock WHERE item_id=10) AS s
   ON s.item_id=t.item_id
   WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
   WHEN NOT MATCHED THEN INSERT VALUES (10,1)
   ;
 
  This works fine, and updates the matching row:
 
  item_id | balance
  -+-
  20 |1900
  10 |2201

 Here you have one row of source data, and you got one action (the WHEN
 MATCHED case).

  But if I give it a key that doesn't exist instead:
 
  MERGE INTO Stock t
   USING (SELECT * FROM Stock WHERE item_id=30) AS s
   ON s.item_id=t.item_id
   WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
   WHEN NOT MATCHED THEN INSERT VALUES (30,1)
   ;
 
  This doesn't execute the NOT MATCHED case and INSERT the way I expected
 it
  to.  It just gives back MERGE 0.

 Here you have no rows of source data (the USING (SELECT ...) doesn't
 return anything, since no rows exist) so nothing happens.


Yes.
The MERGE process is based on a left join between the source table and
target table.
Since here the source table is empty, no join is carried, and thus no MERGE
action is taken.

But, is it correct logically? I mean, should we insert some rows in the
above example rather  than do nothing?


  Since I wasn't sure if the whole subquery in the USING clause case was
  really implemented fully, I then tried to do this with something more
 like
  the working regression test examples.  I expected this to do the same
 thing
  as the first example:
 
  MERGE INTO Stock t
   USING Stock s
   ON s.item_id=10 AND s.item_id=t.item_id
   WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
   WHEN NOT MATCHED THEN INSERT VALUES (10,1)
   ;
 
  But it gives back this:
 
  ERROR:  duplicate key value violates unique constraint
 stock_item_id_key
  DETAIL:  Key (item_id)=(10) already exists.

 Here you have two rows of source data.  The ON clause represents the
 join condition.  The item_id=10 row matches - so you get an update,
 presumably, though we can't see that as things turn out - and the
 item_id=20 row doesn't match - so you try to insert (10, 1), which is
 a duplicate key, thus the error.

  Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side
 of
  things to generate that.  But it doesn't work any better if you give it
 an
  example that doesn't exist:
 
  MERGE INTO Stock t
   USING Stock s
   ON s.item_id=30 AND s.item_id=t.item_id
   WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
   WHEN NOT MATCHED THEN INSERT VALUES (30,1)
   ;
 
  ERROR:  duplicate key value violates unique constraint
 stock_item_id_key
  DETAIL:  Key (item_id)=(30) already exists.

 In this case neither row of the source data matches the join condition
 (s.item_id=30 might as well be constant FALSE as far as the test data
 is concerned) so you attempt to execute the NOT MATCHED side twice.
 So this one also looks correct to me.


Yes, that is what happened in the above two examples.

 The other thing I noticed that may take some work to sort out is that I
 haven't had any luck getting MERGE to execute from within a plpgsql
 function.  I was hoping I could use this to update the pgbench tables:

Good catch.  Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] ask for review of MERGE

2010-09-29 Thread Boxuan Zhai
On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith g...@2ndquadrant.com wrote:
  One compiler warning I noticed that needs to get resolved:
 
  src/backend/commands/explain.c:
 
  explain.c: In function ‘ExplainMergeActions’:
  explain.c:1528: warning: comparison of distinct pointer types lacks a
 cast
 
  That is complaining about this section:
 
  if (mt_planstate-operation != CMD_MERGE ||
  mt_planstate-mt_mergeActPstates == NIL)
  return;
 
  So presumably that comparison with NIL needs a cast. Once I get more
  familiar with the code I'll fix that myself if Boxuan doesn't offer a
  suggestion first.

 Possibly NULL was meant instead of NIL.  NIL is specifically for a List.


Yes, it should be NULL instead of NIL.

At first, I designed this filed as a List. But I changed it into an array in
the last editions. This is why I have an unmatched assignment here. Sorry
for that.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company



Re: [HACKERS] gSoC add MERGE command new patch -- merge_v104

2010-08-24 Thread Boxuan Zhai
On Wed, Aug 25, 2010 at 4:56 AM, Andres Freund and...@anarazel.de wrote:

 On Tue, Aug 24, 2010 at 11:02:41PM +0300, Heikki Linnakangas wrote:
  On 24/08/10 16:35, Boxuan Zhai wrote:
  Hi,
  
  I finished the MERGE on inheritance tables. Now comes the merge_v201
 
  Oh, great! That means that all the known issues are fixed now, and
  all that's left is fixing any issues raised in review.
 
  I've added this to the September commitfest, but I hope I'll find
  some time to look at this before that. I welcome anyone else to
  review this too!
 I have to ask one question: On a short review of the discussion and
 the patch I didn't find anything about the concurrency issues
 involved (at least nodeModifyTable.c didnt show any).
 Whats the plan to go forward at that subject? I think the patch needs
 to lock tables exclusively (the pg level, not access exclusive) as
 long as there is no additional handling...

 Thanks for the work Boxuan!



The concurrency issues are not involved. I don't know much about this part.
I think we need more discussion on it.



 Andres

 PS: The patch reintroduces some whitespace damage...



Re: [HACKERS] MERGE Specification

2010-08-15 Thread Boxuan Zhai
On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 11/08/10 11:11, Boxuan Zhai wrote:

 The new patch is done. I named it as merge_v102. (1 means it is the
 non-inheritance merge command, 02 means this is the second time of fixing
 reported bugs)


 Thanks. I went through this, fixing all the spurious end-of-line whitespace
 first with git apply --whitespace=fix, and then manually fixing comment
 and whitespace style, typos, and doing some other small comment editing.
 Resulting patch attached. It is also available at my git repository at
 git://git.postgresql.org/git/users/heikki/postgres.git, branch
 'mergestmt'. Please base any further patch versions on this patch, so that
 we don't need to redo this cleanup.


Thanks for the cleanup. The codes looks better now. My problem is that I
have done some more modifications after merge_v102. Is there any way to
apply the cleanup patch without erasing my new codes?


 I'll continue reviewing this sometime next week, but here's few
 miscellaneous issues for starters;

 * Explain output of actions needs some work:

   Merge  (cost=246.37..272.96 rows=1770 width=38)
 ACTION: UPDATE WHEN NOT MATCHED
 ACTION: INSERT WHEN NOT MATCHED
 -  Merge Left Join  (cost=246.37..272.96 rows=1770 width=38)

 Should print out more details of the action, like for normal
 updates/inserts/deletes.


Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
details than what we have here except the costs, rows and width, which is
print out at the MERGE command line.

For example:

Explain
update buy set volume = volume + 1;
  QUERY PLAN
--
 Update  (cost=0.00..36.75 rows=2140 width=14)
   -  Seq Scan on buy  (cost=0.00..36.75 rows=2140 width=14)
(2 rows)

For the explanation of an UPDATE command, we only have a Update title
followed by the costs, then, after the arrow - there comes the plan tree.
In a MERGE command, no action has its real private plan. They all share the
main plan.  Thus, the costs for a merge command is that of the main plan.
What is useful for a merge action is only the target list and quals. So my
design is to show the merge command costs in first line. Then print out the
actions and their qualifications in a list, followed by the main plan tree.


Is there any other thing you suggest to print out for each action?


 And all uppercase doesn't fit the surrounding style.


This  will be fixed.


 * Is the behavior now SQL-compliant? We had long discussions about the
 default action being insert or do nothing or raise error, but I never got a
 clear picture of what the SQL spec says and whether we're compliant.

 * What does the one tuple is error notice mean? We'll have to do
 something about that.. I don't think we've properly thought through the
 meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
 only a few dozen lines to put back when we know what to do about it.

I find that we have not reached an agreement on MERGE's syntax yet.
Personally, I support Simon's idea, that is the default action should be
RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
get a missing tuple. Here I just use a simple elog(NOTICE, a tuple is
error); for this situation.

I leave this for further extension when a more specific design for RAISE
ERROR is available.

Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
you want me to chage the default action back to DO NOTHING? Or any other
suggetions? (In fact, my personal thinking is to add a non-omissible ELSE
clause as the end of the action list which forces the user to specify the
default action for merge).


 * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
 be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?



I need one flag in these statement to differentiate them from normal
InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
process of these two kinds of statements in the transfromStmt() function. I
define a set of alias nodes which have different nodetags. This can make
the code simpler.


 * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
 InsertStmt?


Ah, I add this function because I want to have a look of the content of
DeleteStmt. I did this long ago, just after I started the project. If you
don't want this function, I will remove it.


 * I wonder if it would be simpler to check the fact that you can only have
 UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN NOT MATCHED
 action directly in the grammar?


We can do this, but, I think it is better to keep it as it is now.



 * Regarding this speculation in the MergeStmt grammar rule:
/*although we have only one USING table,
we still make it a list, maybe in future
we will allow multiple USING tables.*/

 I wonder what

Re: [HACKERS] MERGE command for inheritance

2010-08-13 Thread Boxuan Zhai
On Fri, Aug 13, 2010 at 2:33 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 13/08/10 09:27, Boxuan Zhai wrote:

 I have renewed the merge.sql and merge.out in regress. Please have a look.


 Thanks.

 Did you change the way DO INSTEAD rules are handled already?
 http://archives.postgresql.org/pgsql-hackers/2010-08/msg00151.php


Yes. This mistake has been corrected a few editions ago. Now, the actions
replaced by INSTEAD rules will still catch tuples but do nothing for them.

 --
  Heikki Linnakangas

  EnterpriseDB   http://www.enterprisedb.com



Re: [HACKERS] MERGE command for inheritance

2010-08-12 Thread Boxuan Zhai
On Thu, Aug 12, 2010 at 12:49 AM, Greg Smith g...@2ndquadrant.com wrote:

 Tom Lane wrote:

 Do we really think this is anywhere near committable now?



 There's a relatively objective standard for the first thing needed for
 commit--does it work?--in the form of the regression tests Simon put
 together before development.  I just tried the latest merge_v102.patch
 (regression diff attached) to see how that's going.  There are still a
 couple of errors in there.  It looks to me like the error handling and
 related DO NOTHING support are the next pair of things that patch needs work
 on.  I'd rather see that sorted out than to march onward to inheritance
 without the fundamentals even nailed down yet.



Sorry for the mismatch problem of regress. In fact, I am still unable to
make the regression test run on my machine. When I try the command
pg_regreess in /src/test/regress/, it always gives a FATAL error:

FATAL:  parameter port cannot be changed without restarting the server
psql: FATAL:  parameter port cannot be changed without restarting the
server
command failed: C:/msys/1.0/local/pgsql/bin//psql -X -c DROP DATABASE IF
EXISTS \regression\ postgres

However, I can run this command directly in the MinGW command line
interface. I guess this is because the psql_command() function has some
check before accept commands. And the MinGW environment  cannot pass these
checks.

All the SQL commands in the .sql file have been tested by hand. And they are
all correct. However, the .out file is not automatic generated by pgsql.

I may need to find a linux system to try to generate the correct .out file
some other time. Or, would someone help me to generate an .out file through
pg_regress?


 --
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us http://www.2ndquadrant.us/


 ***
 /home/postgres/pgwork/repo/git/postgresql/src/test/regress/expected/merge.out
   2010-08-11 12:23:50.0 -0400
 ---
 /home/postgres/pgwork/repo/git/postgresql/src/test/regress/results/merge.out
2010-08-11 12:33:27.0 -0400
 ***
 *** 44,57 
  WHEN MATCHED THEN
UPDATE SET balance = t.balance + s.balance
  ;
 ! SELECT * FROM target;
 !  id | balance
 ! +-
 !   1 |  10
 !   2 |  25
 !   3 |  50
 ! (3 rows)
 !
  ROLLBACK;
  -- do a simple equivalent of an INSERT SELECT
  BEGIN;
 --- 44,50 
  WHEN MATCHED THEN
UPDATE SET balance = t.balance + s.balance
  ;
 ! NOTICE:  one tuple is ERROR
  ROLLBACK;
  -- do a simple equivalent of an INSERT SELECT
  BEGIN;
 ***
 *** 61,66 
 --- 54,61 
  WHEN NOT MATCHED THEN
INSERT VALUES (s.id, s.balance)
  ;
 + NOTICE:  one tuple is ERROR
 + NOTICE:  one tuple is ERROR
  SELECT * FROM target;
   id | balance
  +-
 ***
 *** 102,107 
 --- 97,103 
  WHEN MATCHED THEN
DELETE
  ;
 + NOTICE:  one tuple is ERROR
  SELECT * FROM target;
   id | balance
  +-
 ***
 *** 165,176 
  ERROR:  multiple actions on single target row

  ROLLBACK;
 !
  -- This next SQL statement
  --  fails according to standard
  --  suceeds in PostgreSQL implementation by simply ignoring the second
  --  matching row since it activates no WHEN clause
  BEGIN;
  MERGE into target t
  USING (select * from source) AS s
  ON t.id = s.id
 --- 161,175 
  ERROR:  multiple actions on single target row

  ROLLBACK;
 ! ERROR:  syntax error at or near ERROR
 ! LINE 1: ERROR:  multiple actions on single target row
 ! ^
  -- This next SQL statement
  --  fails according to standard
  --  suceeds in PostgreSQL implementation by simply ignoring the second
  --  matching row since it activates no WHEN clause
  BEGIN;
 + ERROR:  current transaction is aborted, commands ignored until end of
 transaction block
  MERGE into target t
  USING (select * from source) AS s
  ON t.id = s.id
 ***
 *** 179,184 
 --- 178,184 
  WHEN NOT MATCHED THEN
INSERT VALUES (s.id, s.balance)
  ;
 + ERROR:  current transaction is aborted, commands ignored until end of
 transaction block
  ROLLBACK;
  -- Now lets prepare the test data to generate 2 non-matching rows
  DELETE FROM source WHERE id = 3 AND balance = 5;
 ***
 *** 188,195 
  +-
2 |   5
3 |  20
 -   4 |   5
4 |  40
  (4 rows)

  -- This next SQL statement
 --- 188,195 
  +-
2 |   5
3 |  20
4 |  40
 +   4 |   5
  (4 rows)

  -- This next SQL statement
 ***
 *** 203,216 
  WHEN NOT MATCHED THEN
INSERT VALUES (s.id, s.balance)
  ;
  SELECT * FROM target;
   id | balance
  +-
1 |  10
2 |  20
3 |  30
 -   4 |   5
4 |  40
  (5 rows)

  ROLLBACK;
 --- 203,218 
  WHEN NOT MATCHED THEN
INSERT VALUES (s.id, s.balance)
  ;
 + NOTICE:  one tuple is ERROR
 + NOTICE: 

Re: [HACKERS] MERGE Specification

2010-08-11 Thread Boxuan Zhai
On Wed, Aug 11, 2010 at 12:18 PM, Boxuan Zhai bxzhai2...@gmail.com wrote:



  On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith g...@2ndquadrant.comwrote:

 Boxuan Zhai wrote:

 I just found that no Assert() works in my codes. I think it is because
 the assertion is no enabled. How to enable assertion. To define
 USE_ASSERT_CHECKING somewhere?


 When you run configure before make, use --enable-cassert.  The
 normal trio for working on the PostgreSQL code is:

 ./configure --enable-depend --enable-cassert --enable-debug

 Generally the only reason to build as a developer without asserts on is to
 do performance testing.  They will slow some portions of the code down
 significantly.


 Thanks. I will test MERGE under this new configuration. A new patch will be
 submitted once I fix all the asserting bugs.



The new patch is done. I named it as merge_v102. (1 means it is the
non-inheritance merge command, 02 means this is the second time of fixing
reported bugs)

   --
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us http://www.2ndquadrant.us/





merge_v102.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MERGE command for inheritance

2010-08-11 Thread Boxuan Zhai
On Wed, Aug 11, 2010 at 4:27 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 10/08/10 12:38, Boxuan Zhai wrote:

 These days I am considering what else can be done for MERGE, And, I
 find inheritance tables in postgres is not supported by our MERGE command
 yet.


 I played with your latest patch version a bit, and actually, it seems to me
 that inherited tables work just fine. I ran into the assertion failures
 earlier while trying that, but that has now been fixed. Can you give an
 example of the kind of query that's not working yet?


Well, in the patch I submitted, the target relation is forced not to scan
any inheritance tables. That is, the command always acts like
MERGE into *ONLY* foo USING bar 

So, the inheritance in current MERGE should not work, I think.


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



Re: [HACKERS] MERGE command for inheritance

2010-08-11 Thread Boxuan Zhai
On Wed, Aug 11, 2010 at 4:45 PM, Simon Riggs si...@2ndquadrant.com wrote:

  On Tue, 2010-08-10 at 17:15 +0300, Heikki Linnakangas wrote:
  On 10/08/10 12:38, Boxuan Zhai wrote:
   The difficult way is to generate the plans for children table in
 planner, as
   the other commands like UPDATE and DELETE. However, because the
 structure of
   MERGE plan is much more complex than the ordinary ModifyTable plans,
 this
   job may not as simple as we expected. We need to adjust both the main
 plan
   and the
   merge actions to fit the children tables, which is not straight
 forward.
 
  This the approach you'll have to take. But actually, I'm surprised it
  doesn't happen to just work already. It should be opaque to the merge
  facility that the reference to the parent target table has inherited
  child tables - expanding the inherited table to scans of all the
  children should already be handled by the planner.

 The support for UPDATE and SELECT of partitioned cases is very different
 in the planner and was handled as separate implementation projects.

 If we want a working MERGE in the next release, I suggest that we break
 down this project in the same way and look at partitioned target tables
 as a separate project.

 One reason for suggesting this is that all MERGE statements have a
 source table, whereas UPDATE and DELETEs did not always. The plan for a
 simple UPDATE and DELETE against a partitioned table is simple, but the
 plan (and performance) of a joined UPDATE or DELETE is not good:

 postgres=# explain update p set col2 = x.col2 from x where x.col1 =
 p.col1;
QUERY
 PLAN
 ---
  Update  (cost=299.56..1961.18 rows=68694 width=20)
   -  Merge Join  (cost=299.56..653.73 rows=22898 width=20)
 Merge Cond: (public.p.col1 = x.col1)
 -  Sort  (cost=149.78..155.13 rows=2140 width=10)
   Sort Key: public.p.col1
   -  Seq Scan on p  (cost=0.00..31.40 rows=2140 width=10)
 -  Sort  (cost=149.78..155.13 rows=2140 width=14)
   Sort Key: x.col1
   -  Seq Scan on x  (cost=0.00..31.40 rows=2140 width=14)
   -  Merge Join  (cost=299.56..653.73 rows=22898 width=20)
 Merge Cond: (public.p.col1 = x.col1)
 -  Sort  (cost=149.78..155.13 rows=2140 width=10)
   Sort Key: public.p.col1
   -  Seq Scan on p1 p  (cost=0.00..31.40 rows=2140
 width=10)
 -  Sort  (cost=149.78..155.13 rows=2140 width=14)
   Sort Key: x.col1
   -  Seq Scan on x  (cost=0.00..31.40 rows=2140 width=14)
   -  Merge Join  (cost=299.56..653.73 rows=22898 width=20)
 Merge Cond: (public.p.col1 = x.col1)
 -  Sort  (cost=149.78..155.13 rows=2140 width=10)
   Sort Key: public.p.col1
   -  Seq Scan on p2 p  (cost=0.00..31.40 rows=2140
 width=10)
 -  Sort  (cost=149.78..155.13 rows=2140 width=14)
   Sort Key: x.col1
   -  Seq Scan on x  (cost=0.00..31.40 rows=2140 width=14)

 Those plans could use some love and attention before forcing Boxuan to
 implement that.



It seems that we have not decided whether to put the inheritance for MERGE
off for a latter implementation. But, I think we can discuss how to do it
now.

 First of all, the inheritance of MERGE should not be implemented in the
rule-like way. I agree that the easy way I proposed is not consistent with
the general inheritance process in postgres.

The normal way of doing this is to handle it in planner, to be more
specific, we need to extend the function inheritance_planner() for
processing MERGE queries.

For UPDATE and DELETE commands (INSERT is not an inheritable command), if
inheritance_planner finds that the target table has children tables, it
will generate a list of queries. These queries are almost the same as the
original query input by user, except for the different target
relations. Each child table has it corresponding query in this list.

This list of queries will then be processed by grouping_planner() and
transformed into a list of plans. One most important work finished in
this function is to extend the target list of target relations to make sure
that all attributes of a target relation appears in the final result tuple
of its plan.

As for MERGE command, we need to do the same thing. But, since the main
query body is a LEFT JOIN query between source table and target table, the
top-level target list is a combination of all the attributes from source
table and target table. Thus, when we extend the target list, we should only
extent the part of target relations, and keep the source table part
untouched.

Once a main query in this style has been transformed to plan, we need to
prepare the merge actions for it too. That is, extend the target list of all
UPDATE and INSERT actions for the corresponding target relation. In this
way, each target relation will have its own

[HACKERS] MERGE command for inheritance

2010-08-10 Thread Boxuan Zhai
Hi,

These days I am considering what else can be done for MERGE, And, I
find inheritance tables in postgres is not supported by our MERGE command
yet.

Currently, MERGE command is only able to handle the target table itself, and
its children tables are not involved in the process.
I am not sure if inheritance of MERGE is needed by postgres. If we need, I
may propose two methods for implementing it.

An easy way to do it is use a rule-like strategy. We can generate new MERGE
query statements with the children table as their target tables. Then the
new
query statement will be planned and executed in the normal way. This process
can be put in the rewriter, before the queries are planned.
This method is quite easy but seems not follow the tradition of inheritance
in Postgres.


The difficult way is to generate the plans for children table in planner, as
the other commands like UPDATE and DELETE. However, because the structure of
MERGE plan is much more complex than the ordinary ModifyTable plans, this
job may not as simple as we expected. We need to adjust both the main plan
and the
merge actions to fit the children tables, which is not straight forward.

I would like to know your opinions on this problem.

PS: for my investigation on the inheritance actions, I find that although
the children tables are modified by the UPDATE or DELETE commands on their
ancestor tables, the rules defined on them are not activated during the
query. Is this the case (I hope I am not asking a stupid question)? And, if
so, I may ask why we want it to act like this.

Boxuan


Re: [HACKERS] MERGE Specification

2010-08-10 Thread Boxuan Zhai
On Tue, Aug 10, 2010 at 10:29 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 10/08/10 12:08, Boxuan Zhai wrote:

 Thanks for your feedback. I fixed all the above waring bugs. Find the new
 patch in attachement.


 Thanks.

 I'm getting an assertion failure with this statement:

 CREATE TABLE foo (id int4);

 MERGE into foo t
 USING (select id FROM generate_series(1,5) id) AS s
 ON t.id = s.id
 WHEN NOT MATCHED THEN INSERT (id) VALUES (s.id);

 The query works on my machine.


 TRAP: FailedAssertion(!(ActiveSnapshotSet()), File: postgres.c, Line:
 749)

 That's easily fixed - you need to add case T_MergeStmt to the list of
 optimizable command types in analyze_requires_snapshot() function.

 Unfortunately that doesn't get you far, the query then trips another
 assertion:

 TRAP: FailedAssertion(!(list_length(resultRelations) ==
 list_length(subplans)), File: createplan.c, Line: 3929)



I just found that no Assert() works in my codes. I think it is because the
assertion is no enabled. How to enable assertion. To define
USE_ASSERT_CHECKING somewhere?



  --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



Re: [HACKERS] MERGE Specification

2010-08-10 Thread Boxuan Zhai
On Wed, Aug 11, 2010 at 12:14 PM, Greg Smith g...@2ndquadrant.com wrote:

 Boxuan Zhai wrote:

 I just found that no Assert() works in my codes. I think it is because the
 assertion is no enabled. How to enable assertion. To define
 USE_ASSERT_CHECKING somewhere?


 When you run configure before make, use --enable-cassert.  The normal
 trio for working on the PostgreSQL code is:

 ./configure --enable-depend --enable-cassert --enable-debug

 Generally the only reason to build as a developer without asserts on is to
 do performance testing.  They will slow some portions of the code down
 significantly.


Thanks. I will test MERGE under this new configuration. A new patch will be
submitted once I fix all the asserting bugs.


 --
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us http://www.2ndquadrant.us/




Re: [HACKERS] MERGE Specification

2010-08-06 Thread Boxuan Zhai
On Fri, Aug 6, 2010 at 3:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:

   SQL:2011 makes no mention of how MERGE should react to statement level
   triggers. MERGE is not a trigger action even. Given considerable
   confusion in this area, IMHO we should just say the MERGE does not call
   statement triggers at all, of any kind.
 
  IMO the UPDATE/DELETE/INSERT actions should fire the respective
  statement level triggers, but the MERGE itself should not.

 When, and how?

 If an UPDATE is mentioned 5 times, do we call the trigger 5 times?


My current process for BEFOR / AFTER STATEMENT trigger on MERGE is to fire
the triggers for all action types that appears in the command, unless it is
replaced by a INSTEAD rule. But the triggers for one action type will be
fired only once. That means you will get both UPDATE and INSERT triggers be
activated for only once if you are executing a MERGE command with 5 UPDATEs
and 10 INSERTs.


 What happens if none of the UPDATEs are ever executed?

 The triggers (I mean triggers for statement) will be fired anyway even the
UPDATE action matches no tuple. This is not for MERGE only. If you update a
table with the command
UPDATE foo SET ... WHERE false;
It will also fire the STATEMENT triggers of UPDATE type on foo (I think so).


And, even not been asked, I want to say that, in current implementation of
MERGE,  the row level triggers are fired by the actions that take the
tuples. If one tuple is caught by an UPDATE action, then the UPDATE row
trigger will be fired on this tuple. If it is handled by INSERT action, then
the INSRET row triggers are on.

Hope you agree with my designs.


 Best explain exactly what you mean.

 --
  Simon Riggs   www.2ndQuadrant.com http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training and Services




Re: [HACKERS] MERGE Specification

2010-08-06 Thread Boxuan Zhai
On Fri, Aug 6, 2010 at 3:53 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Fri, 2010-08-06 at 10:28 +0300, Heikki Linnakangas wrote:
  On 06/08/10 10:12, Simon Riggs wrote:
   So DO NOTHING is the default and implies silently ignoring rows. RAISE
   ERROR is the opposite.
  
   Coding for those seems very easy, its just a question of should we do
   it?. DB2 has it; SQL:2008 does not. But then SQL:2008 followed the DB2
   introduction of AND clauses, and SQL:2011 has so far followed the DB2
   introduction of DELETE action also.
 
  I see neither DO NOTHING or RAISE ERROR in the documentation of DB2,
  Oracle, or MSSQL server.

 Agreed, Oracle and MSSQL server does not have these.

 However, DB2 very clearly does have these features

 * SIGNAL which raises an error and can be used in place of any action,
 at any point in sequence of WHEN clauses. DB2 already supports SIGNAL as
 part of SQL/PSM, which we do not, so RAISE ERROR was the nearest
 equivalent command for PostgreSQL.

 * ELSE IGNORE which does same thing as DO NOTHING, except it must always
 be last statement in a sequence of WHEN clauses. DO NOTHING is already a
 phrase with exactly this meaning in PostgreSQL, so I suggest that.


So, we need to add both DO NOTHING and RAISE ERROR actions in the MERGE
command now !? What will RAISE ERROR do? To stop the whole MERGE command OR,
just throw an error notice for the row and move on.



  --
  Simon Riggs   www.2ndQuadrant.com http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training and Services




Re: [HACKERS] MERGE Specification

2010-08-05 Thread Boxuan Zhai
On Thu, Aug 5, 2010 at 7:25 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, 2010-08-05 at 12:29 +0300, Heikki Linnakangas wrote:
  On 05/08/10 10:46, Simon Riggs wrote:
   On Mon, 2008-04-21 at 21:08 +0100, Simon Riggs wrote:
   The following two files specify the behaviour of the MERGE statement
 and
   how it will work in the world of PostgreSQL.
  
   The HTML file was generated from SGML source, though the latter is not
   included here for clarity.
  
   Enclose merge.sgml docs for forthcoming MERGE command, as originally
   written.
 
  Oh, cool, I wasn't aware you had written that already. Boxuan, please
  include this in your patch, after reviewing and removing/editing
  anything that doesn't apply to your patch.


Thanks a lot for the instruction file of MERGE command. I have read through
it carefully. It is really a great work. I have to admit that I am not
familiar with the sgml language, and I cannot write the instruction by
myself.

All features of MERGE demonstrated in this file are consistent with my
implementation, EXCEPT the DO NOTHING option. In current edition, we don't
have the DO NOTHING action type. That is, during the execution of MERGE
commands, if one tuple is not caught by any of the merge actions, it will be
ignored. In another word, DO NOTING (although cannot be specified explicitly
by user) is the DEFAULT action for tuples.

In the contrary, Simon's instruction says that the DEFAULT action for the
tuple caught by no actions is
WHEN NOT MATCHED THEN INSERT DEFAULT VALUES

From the user's point of view, these two kinds of MERGE command may have
not much differences. But, as the coder, I prefer current setting, because
we can save the implementation for a new type of MERGE actions (DO
NOTHING is a special merge action type). And, thus, no checks and special
process for it. (For example, we need to make sure that DO NOTHING is the
last WHEN clause, and it has no additional qual. And we have to generate a
INSERT DEFAULT VALUES action for the MERGE command if we don't find the DO
NOTHING action)

Well, if people want the DO NOTHING action, I will add it in the system.


 Now, I have changed the RULE strategy of MERGE to the better logic. And I
am working on triggers for MERGE, which is also mentioned in the instruction
file. I will build a new patch with no long comment and blank line around
functions, and possibly contain the regress test file and this sgml
instructions in it.

I wish we can reach a agreement on the DO NOTHING thing before my next
submission, so I can make necessary modification on my code for it. (the new
patch may be finished in one or two days, I think)

Thanks!

PS: I have an embarrassing question: how to view the sgml instructions of
postgres in web page form, rather than read the source code of them?


 Also had these fragments as well, if they're still useful. Probably just
 useful as pointers as to what else to change to include the docs.


 The tests and docs were written from SQL standard, so any deviations
 would need to be flagged. The idea of writing the tests first was that
 they provide an objective test of whether the implementation works
 according to spec.

 I'd quite like a commentary on anything that needs changing. Not saying
 I will necessarily object to differences, but knowing the differences
 sounds important for us.

 --
  Simon Riggs   www.2ndQuadrant.com http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training and Services



Re: [HACKERS] MERGE Specification

2010-08-05 Thread Boxuan Zhai
Dear All,

I have seen a lively discussion about the DO NOTING action in MERGE command.
And, I think most people want it. So it will be added to my next patch.

Before the implementation, I still have some questions to confirm:

1. If we have a DO NOTHING action specified, it should be the last WHEN
clause. It must be of the NOT MATCHED cases, and it CANNOT have any
additional action qualifications. Am I correct?

2. If no DO NOTHING specified, we will imply a INSERT DEFAULT VALUES action
as the end of MERGE.
My question is, is this action taken only for the NOT MATCHED tuples?  If
this is the case, then what about the MATCHED tuples that match not previous
actions? Ignore them?
That means we are in fact going to add two implicit WHEN clause:
   a) WHEN NOT MATCHED INSERT default values;
   b) WHEN MATCHED THEN DO NOTHING.
OR, is the INSERT DEFAULT VALUES applied to ALL tuples not matter they are
MATCHED or not?


Besides, (I mean no offense, but) can this method really avoid losing row?

So far as I know, the DEFAULT values for table attributes are defined when
the table is created and no variables are allowed in the default value
expressions. That means, they are usually constants or simple serial
numbers.

Image that we have a MERGE command that has thousands of NOT MATCHED tuples
going to the implicit action. Then, the target table will inserted with
thousands of rows with DEAULT VALUES. These row will have similar (if not
exactly the same) simple content, which contains NO information from the
source table of MERGE. Is this really what we want? If it is not, then what
is the use of the INSERT DEFAULT VALUES action?

Regards


Re: [HACKERS] merge command - GSoC progress

2010-08-04 Thread Boxuan Zhai
 Dear Robert,

I am just considering that there may be some logical mistakes for my rule
rewriting strategy of MERGE actions.

In my current design, if we find that an action type, say UPDATE, is
replaced by INSTEAD rules, we will remove all the actions of this type from
the MERGE command, as if they are not be specified by user from the
beginning. See the test example in my pages for this situation.
https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules

Now,I am thinking that maybe we should keep the replaced actions in action
list, and just mark them to be invalid. If one join tuple from the main
plan fits the condition of this action, we will do nothing on it.

This strategy is a little bit different with the current one. If we delete
an action, the tuples that meet it condition will be caught by other
actions. If we keep it, the tuples that match it will be skipped.

I think the new design is more logical, and I wonder your opinion on this
problem.

Yours Boxuan


Re: [HACKERS] merge command - GSoC progress

2010-08-03 Thread Boxuan Zhai
2010/8/4 Greg Smith g...@2ndquadrant.com

 Boxuan Zhai wrote:

 I think there are no redundant lines in this time's patch file.


 It is much better.  There are still more blank likes around the new code
 you've added than are needed in many places, but that doesn't interfere with
 reading the patch.

 Sorry, it is my personal habit of leaving blanks around codes. I will chage
this if it doesn't follow the pgsql coding style.


 The main code formatting issue left you'll need to address eventually are
 all the really long comments in there.

 I will correct the long comments in the next patch.



 And, I have tested the running of MERGE command with different situations.
 I am sorry that I didn't create regression test files, because I am not sure
 how to add new files in the git package.


 git add filename ?

 The tests you've put in there are the right general sort of things to try
 out.  The one example you gave does show an UPSERT being emulated by MERGE,
 which is the #1 thing people are looking for initially.

 In fact, I have created a merge.sql with simple merge example. I put it in
the folder of /src/test/regress/sql/ and modified the serial_schedule file
to add a line of : test: merge
Is this correct?

But, I don't know how to run regress to test this sql file. My make check
fails when install the db. I think this is because I do it under a MinGW
environment and some parameters are not matched with the default setting of
postgres.
I can configure, make install , initdb and do sql query in psql successfully
in my machine. So the source code itself should be correct.

I put my merge.sql in attachment, in case anyone want to have a look.





 --
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us http://www.2ndquadrant.us/




merge.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] merge command - GSoC progress

2010-07-31 Thread Boxuan Zhai
2010/7/31 Greg Smith g...@2ndquadrant.com

 Boxuan Zhai wrote:

 I create a clean patch file (no debug clauses). See the attachment.


 Some general coding feedback for you on this.

 Thanks for your consideration!


 It's not obvious to people who might want to try this out what exactly they
 are supposed to do.  Particularly for complicated patches like this, where
 only a subset of the final feature might actually be implemented, some sort
 of reviewer guide suggesting what should and shouldn't work would be
 extremely helpful.  I recall there was some sort of patch design guide in an
 earlier version of this patch; it doesn't seem to be there anymore.  Don't
 remember if that had any examples in it.

 I am now working on fixing a bug which makes the system unable to initdb. I
will update my page in postgres Wiki with a detailed instruction of my
implementation and testing examples soon, with my next patch file.


 Ultimately, the examples of working behavior for this patch will need to be
 put into code.  The way that happens is by adding working examples into the
 regression tests for the database.  If you had those done for this patch, I
 wouldn't have to ask for code examples; I could just look at the source to
 the regression output and see how to use it against the standard database
 the regression samples create, and then execute against.  This at least lets
 you avoid having to generate some test data, because there will already be
 some in the regression database for you to use.  There is an intro this
 topic at http://wiki.postgresql.org/wiki/Regression_test_authoring Another 
 common way to generate test data is to run pgbench which creates 4
 tables and populates them with data.

 I will try to add my testing examples to the gregression folder.


 As far as the patch itself goes, you have some work left on cleaning that
 up still you'll need to do eventually.  What I would suggest is actually
 reading the patch itself; don't just generate it and send it, read through
 the whole thing like someone new to it would do.  One way you can improve
 what you've done already is to find places where you have introduced changes
 to the code structure just through editing.  Here's an example of what I'm
 talking about, from line 499 of your patch:

 -break;
 +break;
 This happened because you added two invisible tabs to the end of this line.
  This makes the patch larger for no good reason and tends to infuriate
 people who work on the code.  There's quite a bit of extra lines added in
 here too that should go.  You should consider reducing the ultimate size of
 the patch in terms of lines a worthwhile use of your time, even if it
 doesn't change how things work.  There's lots of examples in this one where
 you put two or three lines between two statements when a single one would
 match the look of the code in that file.  A round of reading the diff
 looking for that sort of problem would be useful.

 Another thing you should do is limit how long each line is when practical.
  You have lots of seriously wide comment lines here right now in particular.
  While there are some longer lines in the PostgreSQL code right now, that's
 code, not comments.  And when you have a long line and a long comment, don't
 tack the comment onto the end.  Put it on the line above instead.  Also,
 don't use // in the middle of comments the way you've done in a few places
 here.

 Sorry for these mistakes, again. I promise that the same thing will not
happen in my next patch.



 Getting some examples sorted out and starting on regression tests is more
 important than the coding style parts I think, just wanted to pass those
 along while I noticed them reading the patch, so you could start looking out
 for them more as you continue to work on it.

 --
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us http://www.2ndquadrant.us/




Re: [HACKERS] merge command - GSoC progress

2010-07-28 Thread Boxuan Zhai
2010/7/28 Robert Haas robertmh...@gmail.com

 On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai bxzhai2...@gmail.com wrote:
  I have get a edition that the merge command can run. It accept the
 standard
  merge command and can do UPDATE, INSERT and DELETE actions now. But we
  cannot put additional qualification for actions. There are some bugs when
 we
  try to evaluate the quals which make the system quit. I will fix it soon.

 This patch doesn't compile.  You're using zbxprint() from a bunch of
 places where it's not defined.  I get compile warnings for all of
 those files and then a link failure at the end.  You might find it
 useful to create src/Makefile.custom in your local tree and put
 COPT=-Werror in there; it tends to prevent problems of this kind.

 Undefined symbols:
  _zbxprint, referenced from:
  _transformStmt in analyze.o
  _ExecInitMergeAction in nodeModifyTable.o
  _ExecModifyTable in nodeModifyTable.o
  _ExecInitModifyTable in nodeModifyTable.o
  _merge_action_planner in planner.o

 Sorry, this is a debug function defined by me. It may not be included in
the patch. I add a line of #define zbxprint printf somewhere in the
system.


 Not that it's as high-priority as getting this fully working, but you
 should revert the useless changes in this patch - e.g. the one-line
 change to heaptuple.c is obvious debugging leftovers, and all of the
 changes to execQual.c and execUtil.c are whitespace-only.  You should
 also try to make your code and comments conform to project style
 guidelines.  In general, you'll find it easier to keep track of your
 changes (and you'll have fewer spurious changes) if you use git diff
 master...yourbranch instead of marking comments, etc. with ZBX or
 similar.



I will clean all these in my next patch.

I am now very confused with the failure of action qualification. I look
through the whole process of a query, from parser to executor. In my
opinion, the qualification transformed in analyzer, will be processed by
prepsocess_qual_condition() in planner, and then by the ExecInitExpr()
function in excutor start phase (in InitPlan() function). Then the qual is
ready to be used in ExecQual(). Am I correct?

I have done these on the merge action qual, but when I pass them into
ExecQual(), the server just closed abnormally. I don't know if I missed any
steps on preparing  the qual expressions.

  --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company



Fwd: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

2010-07-17 Thread Boxuan Zhai
-- Forwarded message --
From: Boxuan Zhai bxzhai2...@gmail.com
Date: 2010/7/17
Subject: Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission
To: Simon Riggs si...@2ndquadrant.com




2010/7/17 Simon Riggs si...@2ndquadrant.com

 On Fri, 2010-07-16 at 08:26 +0800, Boxuan Zhai wrote:
  The merge actions are transformed into lower level queries. I create a
  Query node  for each of them and append them in a newly create List
  field mergeActQry. The action queries have different command type and
  specific target list and qual list, according to their declaration by
  user. But they all share the same range table. This is because we
  don't need the action queries to be planned latter. The joining
  strategy is decided by the top query. We are only interest in their
  specific action qualifications. In other words, these action queries
  are only containers for their target list and qualifications.
 
  2. When the query is ready, it will be send to rewriter. In this part,
  we can call RewriteQuery() to handle the action queries. The UPDATE
  action will trigger rules on UPDATE, and so on. What need to be
  noticed are: 1. the actions of the same type should not be rewritten
  repeatedly. If there are two UPDATE actions in merge command, we
  should not trigger the ON UPDATE rules twice. 2. if an action type is
  fully replaced by rules, we should remove all actions of this type
  from the action list.
  Rewriter will also do some process on the target list of each action.

 IMHO it is a bad thing that we are attempting to execute each action
 statement as a query. That means we need to execute an inner SQL
 statement for each row returned by the top level query.

 That design makes MERGE similar in performance to an upsert PL/pgsql
 function, which will perform terribly on large numbers of rows.

 Dear Simmon,

Thanks for your feedback. I may not present my idea clearly.
In my design, the merge actions are not executed as separate queries. Only
the top level query (that is a query like source table LEFT JOIN
target_table ON matching_qual ) will be planned and executed. For each
tuple return by this plan, we will choose a proper action for it and do the
corresponding modification. The tables will only be scanned and joined
once. One merge action will not do a full run of tables join and then modify
table as a standard UPDATE/DELETE/INSERT query.  (Is this what you are
worried about?)

In fact, for one action, we only need the information of: 1. the action type
(UPDATE or DELTE or INSERT). 2 the target list. and 3. the additional
qualifications. And a Query node is a perfect container for these infor.
That's why I transform them in to Query nodes. But all through the analyzer,
rewriter, planner and executor. I just call related functions to formalize
the expressions in their target list and qual lists. The range table and
join tree is only dermined by the top level query, they will not be effected
by merge actions.




 This was exactly the point where I stopped implementation previously:
 attempting to make MERGE work with rules is enough to prevent a tighter
 in-executor implementation of the action list.

I am sorry that I don't catch your meanning here clearly.
As my understanding, if there is a rule on the target table, the rewriter
will add a new query in the execution queue. (or replace the original
query).  I think the rule queries will not effect the process within the
original query, because they are totally separate queries which will be run
before or after the original query. Are you suggest that we should not allow
rules on MERGE command?



 [To Boxuan, on a personal note, you seem to be coping quite well with
 the code and the process; congratulations and keep going.]


Thank you. Your encouragement is very important to me.


  --

  Simon Riggs   www.2ndQuadrant.com http://www.2ndquadrant.com/
  PostgreSQL Development, 24x7 Support, Training and Services




Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

2010-07-16 Thread Boxuan Zhai
Hi,

For the EXPLAIN MERGE command, I expect it to return a result similar to
that of a SELECT command.

I think the EXPLAIN command is to show how the tables in a query is scaned
and joined. In my design, the merge command will generate a top-level query
(and plan) as the main query. It is in fact a left join select query over
the source and target tables.  This main query (plan) decides how the tables
are scanned. The merge actions will not effect this process. So when we
explain the merge command, a similar result will be returned.

For example the command
EXPLAIN
MERGE INTO Stock USING Sale ON Stock.stock_id = Sale.sale_id
WHEN MATCHED THEN UPDATE SET balance = balance + sale.vol;
WHEN 
.

Will return a result just like that of the following command:

EXPLAIN
SELECT * FROM Sale LEFT JOIN Stock ON stock_id = sale_id;

Yours Boxuan.

2010/7/16 Heikki Linnakangas heikki.linnakan...@enterprisedb.com

 On 16/07/10 03:26, Boxuan Zhai wrote:

 PS: Heikki asked me about what the EXPLAIN MERGE ... command will do.
 Well, I have not test it, but it may through an error or just explain the
 top plan, since I put the action plans in a new field, which cannot be
 recognized by old functions.


 I meant what EXPLAIN MERGE output will look like after the project is
 finished, not what it will do at this stage. I was trying to get a picture
 of how you're thinking to implement the executor, what nodes there is in a
 MERGE plan.

 --
  Heikki Linnakangas

  EnterpriseDB   http://www.enterprisedb.com



Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

2010-07-15 Thread Boxuan Zhai
Dear Hackers

I considered my situation. And I found that I didn't communicate well with
you, as makes you have little confidence on my project. Most of the time I
just work by myself and not report to you frequently. I always want to
finish a solid stage progress before do a submission. This may be a bad
habit in the remote project.

In fact, I have a detailed design on how to implement the command and I am
working hard these days to catch the schedule.

In my design,
1.  the merge command is firstly transformed to a MergeStmt node in
parser. And analyzer will generate a left out join query as the top query
(or main query). This query is similar to a SELECT command query, but I set
target relation in it. The top query will drive the scanning and joining
over target and source tables.

The merge actions are transformed into lower level queries. I create a Query
node  for each of them and append them in a newly create List field
mergeActQry. The action queries have different command type and specific
target list and qual list, according to their declaration by user. But they
all share the same range table. This is because we don't need the action
queries to be planned latter. The joining strategy is decided by the top
query. We are only interest in their specific action qualifications. In
other words, these action queries are only containers for their target list
and qualifications.

2. When the query is ready, it will be send to rewriter. In this part, we
can call RewriteQuery() to handle the action queries. The UPDATE action will
trigger rules on UPDATE, and so on. What need to be noticed are: 1. the
actions of the same type should not be rewritten repeatedly. If there are
two UPDATE actions in merge command, we should not trigger the ON UPDATE
rules twice. 2. if an action type is fully replaced by rules, we should
remove all actions of this type from the action list.
Rewriter will also do some process on the target list of each action.

The first submission has finished the above part.

3. In planner, the top level query is handled in a normal way. Since it has
almost the same structure as a SELECT query, the planner() function can work
on it straight forward. However, we need a small change here. The merge
command has a target relation, which need a ctid junk attribute in the
target list. The ctid is required by the UPDATE and DELETE actions.

Besides, for each of the action queries, we also need to create a Plan node.
We don't need to do a full plan on the action queries. The crucial point is
to preprocess the target list and qualification of each action. (Explanation
for this point. The execution of a merge action is composed by two parts.
The top plan will be executed in the main loop, and return the joined tuples
one by one. And a action will apply its qualification on the returned
tuples. If succeed, it will take the action and do corresponding
modification on the target table. Thus, even we have a Plan node created for
each action, we don't want to throw it directly into Planner() function.
That will generate a new plan over the tables in Range Table, which is very
probably different with the top-level plan. If we run the action plans
directly, they will be confilict with each other).

I create a function  merge_action_planner() to do this job. This part is
added at the end of standard_planner(). After that, all the plans of merge
actions are linked into a new List filed in PlannedStmt result of the top
plan.

4. When planner is finished, the plan will be send to executor through
PortalRun(). As a new command, merge will chose the PORTAL_MULTI_QUERY
strategy, and be sent to ProcessQuery() function.

5. As in the ExecutorStart() part, we need to set junkfilter for merge
command, since we have a ctid junk attr in target list. And, the merge
action plans should also be initialized  and transformed into PlanState
nodes. However, the initialization over action plan is only focus on the
target list and quals. We don't need other part of traditional plan
initialization, since these action plans are not for scanning or joining
(this is the job of top plan). We only want to transform the action
information into standard format that can be used by qualification evaluator
in executor.
I HAVE DONE ALL THE ABOVE IN A SECOND SUBMISSION.

6. In ExecutorRun() part, the top plan will be passed into ExecutePlan().
The action planstates can be found in the
estate-es_plannedstmt field.
The top plan can return tuples of the left out join on source table and
target table. (I can see the tuple be returned in my codes). Thus, the
design is correct. At least the top plan can do its work well. In the
junkfilter, if we can find a non-null ctid, it is a matched tuple, or else,
it is a NOT MATCHED tuple. Then we need to evaluate the additional quals of
the actions one by one. If the evaluations of one action succeed, we will
take this action and skip the remaining ones.

Since the target list and qual expressions are all 

Re: [HACKERS] gSoC - ADD MERGE COMMAND - code patch submission

2010-07-11 Thread Boxuan Zhai
Hi,

Thanks for all these feedback.

I found that people have problems on running my codes, which probably comes
from my nonstandard submission format. I can compile, install and initialize
postgres in my own machine. The system accepts MERGE command and will throw
an error when it runs into the executor, which cannot recognize the MERGE
command type so far.

I will make a standard patch as soon as possible. Sorry for the troubles.

Yours Boxuan



2010/7/11 Tom Lane t...@sss.pgh.pa.us

 Peter Eisentraut pete...@gmx.net writes:
  On lör, 2010-07-10 at 12:45 -0400, Tom Lane wrote:
  I believe the project standard is to make things readable
  in an 80-column window --- anyone have an objection to stating that
  explicitly?

  Is that what pgindent reformats it to?

 pgindent tries to leave a character or two to spare, IIRC, so its target
 is probably 78 or thereabouts.

regards, tom lane