Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 12:59 PM, Peter Geoghegan  wrote:
> Maybe we should change the ordering of those IndexInfo structs to
> something more suitable, but it must be immutable (it cannot hinge
> upon the details of one particular DML statement).

I meant that it must be stable (not immutable), in the specific sense
that it cannot change without an ALTER TABLE or similar. The order of
insertion should be consistent among all backends that might be
performing DML against the table at any given time, since, in general,
to do otherwise risks deadlocking (perhaps only with hash indexes, or
a much earlier version of the nbtree am that still used heavyweight
locks).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 12:16 PM, Tom Lane  wrote:
> I'm not following.  The only insertions happening in this test case
> are coming from various clients doing the same INSERT ON CONFLICT UPDATE.
> If one of them has successfully inserted "42" into the arbiter index,
> how is it that other ones are not seeing that?

It's complicated. One backend may completely overtake another in this
race. We don't give up early within ExecInsertIndexTuples(), because
there isn't any useful distinction made between the speculative
insertion case, and the deferred constraint case. The ordering, as
such, is therefore irrelevant.

Fortunately, I posted a fix, of sorts, more than a year ago:

https://commitfest.postgresql.org/10/403/

It never occurred to me to make this argument for it.

There is a separate question of how to make the ordering avoid
problems if it did matter (if that patch ever got committed). I think
it would fix this exact test case, but only accidentally, because the
executor gets a list of IndexInfo structs from the relcache in a
consistent though fairly insignificant order (ordered by OID). I don't
think you can change that property within the relcache, because the
ordering must be consistent (to avoid deadlocks, perhaps elsewhere).

Maybe we should change the ordering of those IndexInfo structs to
something more suitable, but it must be immutable (it cannot hinge
upon the details of one particular DML statement). I actually also
wrote a patch to prefer insertion into the primary key first, which
also went nowhere (I gave up on that one, to be fair).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Konstantin Knizhnik

On 08/25/2016 10:08 PM, Peter Geoghegan wrote:

On Thu, Aug 25, 2016 at 11:49 AM, Tom Lane  wrote:

I think the point is that given the way he's set up the test case,
there should be no duplicate violation in the plain unique index
unless there is one in the arbiter index.  So assuming that INSERT
tests the arbiter indexes first, there shouldn't be an error.
Maybe it doesn't do that, but it seems like it would be a good idea
if it did.

Oh, yeah. This is arguably an example of inference failing to infer
multiple unique indexes as arbiters. Inference could, in principle,
recognize that the second unique index is equivalent to the first, but
doesn't. (I don't think that it matters which order anything is tested
in, though, because not finding a dup value in the arbiter index does
not guarantee that there won't be one in the other index. There is no
locking when no conflict is initially found, and so no guarantees
here.)

Anyway, I don't have a lot of sympathy for this point of view, because
the scenario is completely contrived. You have to draw the line
somewhere.


I do not think that this scenario is completely contrived: the cases when a 
table has more than one primary key are quite common.
For example, "user" may have unique e-mail address, phone number and login.
Also, as far as I know, this is not an artificial example, but real case taken 
from customers application...

I am not sure weather it's really bug or feature, but the user's intention was 
obvious: locate record by one of the unique keys and if such record already 
exists,
then increment counter (do update instead of insert).  But there are also good 
arguments why upsert  may report conflict in this case...

If such UPSERT behavior is assumed to be correct, what is the best workaround 
for the problem if we really need to have to separate indexes and want to 
enforce unique constraint for both keys?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Tom Lane
Peter Geoghegan  writes:
> (I don't think that it matters which order anything is tested
> in, though, because not finding a dup value in the arbiter index does
> not guarantee that there won't be one in the other index. There is no
> locking when no conflict is initially found, and so no guarantees
> here.)

I'm not following.  The only insertions happening in this test case
are coming from various clients doing the same INSERT ON CONFLICT UPDATE.
If one of them has successfully inserted "42" into the arbiter index,
how is it that other ones are not seeing that?

> Anyway, I don't have a lot of sympathy for this point of view, because
> the scenario is completely contrived.

Well, I agree this particular test case looks contrived, but it might be
a simplification of a more convincing real-world case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 11:49 AM, Tom Lane  wrote:
> I think the point is that given the way he's set up the test case,
> there should be no duplicate violation in the plain unique index
> unless there is one in the arbiter index.  So assuming that INSERT
> tests the arbiter indexes first, there shouldn't be an error.
> Maybe it doesn't do that, but it seems like it would be a good idea
> if it did.

Oh, yeah. This is arguably an example of inference failing to infer
multiple unique indexes as arbiters. Inference could, in principle,
recognize that the second unique index is equivalent to the first, but
doesn't. (I don't think that it matters which order anything is tested
in, though, because not finding a dup value in the arbiter index does
not guarantee that there won't be one in the other index. There is no
locking when no conflict is initially found, and so no guarantees
here.)

Anyway, I don't have a lot of sympathy for this point of view, because
the scenario is completely contrived. You have to draw the line
somewhere.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Aug 25, 2016 at 7:12 AM, Ivan Frolkov  wrote:
>> So, if we have primary key and unique constraint on a table then upsert will
>> not work as would expected.

> Why is this unexpected?

> You only take the alternative path (UPDATE) in the event of a would-be
> duplicate violation. You can't upsert while using more than one index
> as an arbiter index. This is true unless they're more or less
> equivalent, in which case multiple arbiter indexes can be inferred,
> but that clearly doesn't apply here.

I think the point is that given the way he's set up the test case,
there should be no duplicate violation in the plain unique index
unless there is one in the arbiter index.  So assuming that INSERT
tests the arbiter indexes first, there shouldn't be an error.
Maybe it doesn't do that, but it seems like it would be a good idea
if it did.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT strange behavior

2016-08-25 Thread Peter Geoghegan
On Thu, Aug 25, 2016 at 7:12 AM, Ivan Frolkov  wrote:
> So, if we have primary key and unique constraint on a table then upsert will
> not work as would expected.

Why is this unexpected?

You only take the alternative path (UPDATE) in the event of a would-be
duplicate violation. You can't upsert while using more than one index
as an arbiter index. This is true unless they're more or less
equivalent, in which case multiple arbiter indexes can be inferred,
but that clearly doesn't apply here.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT strange behavior

2016-08-25 Thread Ivan Frolkov

Suppose we have some table

create table cnt( 
 usr_id int primary key, 
 usr_doc_ref text not null, 
 cnt int, 
 sum int 
);

And going to run some insert on conflict update on it (pgbench script):

\setrandom id 1 50 
insert into cnt as c(usr_id,usr_doc_ref, cnt) values(:id, '#'||:id, 1) on 
conflict(usr_id) do update set cnt=c.cnt+1; 

Run it:

 pgbench -c 16 -j 2 -t 5 -n -h localhost -p 5432 -U postgres -f 
upsert2-ok.pgb  work 
transaction type: Custom query 
scaling factor: 1 
query mode: simple 
number of clients: 16 
number of threads: 2 
number of transactions per client: 5 
number of transactions actually processed: 80/80 
latency average: 0.000 ms 
tps = 36475.782816 (including connections establishing) 
tps = 36483.759765 (excluding connections establishing) 

All ok.
Then add a unique constraint to the table.

alter table cnt add constraint usr_doc_ref_uq unique(usr_doc_ref) 

Run pgbench again:

pgbench -c 16 -j 2 -t 5 -n -h localhost -p 5432 -U postgres -f 
upsert2-ok.pgb work
client 2 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#39) already exists.
client 6 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#16) already exists.
client 9 aborted in state 2: ERROR: duplicate key value violates unique 
constraint "usr_doc_ref_uq"
DETAIL: Key (usr_doc_ref)=(#28) already exists.

So, if we have primary key and unique constraint on a table then upsert will 
not work as would expected.




Re: [HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Tom Lane
Peter Geoghegan  writes:
> On Wed, Jul 13, 2016 at 2:49 AM, Bjørnar Ness  wrote:
>> But with upsert/do nothing, this will not work as "needed".
>> 
>> Would it be possible to introduce a "ON CONFLICT SELECT" argument:
>> 
>> with _foo as (
>> insert into foo(i) values(1)
>> on conflict select returning id
>> ) insert into bar(foo_id,i)
>> select id,2 from _foo;

> I gather that the point of this pseudo SQL is to show how you might be
> able to project and select the values not successfully inserted. Can't
> you just pipeline together some CTEs instead?

What's "needed" seems a little ill-defined here, anyway.  Would the SELECT
be expected to return values from the failed-to-be-inserted row, or from
the existing conflicting row?  (Is there certain to be only one
conflicting row?  With exclusion constraints I'd think not.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Peter Geoghegan
On Wed, Jul 13, 2016 at 2:49 AM, Bjørnar Ness  wrote:
> But with upsert/do nothing, this will not work as "needed".
>
> Would it be possible to introduce a "ON CONFLICT SELECT" argument:
>
> with _foo as (
>   insert into foo(i) values(1)
>   on conflict select returning id
> ) insert into bar(foo_id,i)
>   select id,2 from _foo;

I gather that the point of this pseudo SQL is to show how you might be
able to project and select the values not successfully inserted. Can't
you just pipeline together some CTEs instead?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT/RETURNING -> ON CONFLICT SELECT?

2016-07-13 Thread Bjørnar Ness
The new upsert feature is a great addition, but in some cases is not
as usable as
I and seems lots of others would like it to be, take an example with
circular references:

create table foo (
  id serial references bar(foo_id) on delete cascade,
  i int
);

create table bar (
  foo_id integer references foo(id) on delete cascade,
  i int
);

A insert here would be:

with _foo as (
  insert into foo(i) values(1) returning id
) insert into bar(foo_id,i)
  select id,2 from _foo;

But with upsert/do nothing, this will not work as "needed".

Would it be possible to introduce a "ON CONFLICT SELECT" argument:

with _foo as (
  insert into foo(i) values(1)
  on conflict select returning id
) insert into bar(foo_id,i)
  select id,2 from _foo;

-- 
Bj(/)rnar


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-29 Thread Amit Langote
On 2015-06-25 AM 09:51, Amit Langote wrote:
 
 Peter,
 
 On 2015-06-25 AM 02:35, Peter Geoghegan wrote:

 Inheritance with triggers is a leaky abstraction, so this kind of
 thing is always awkward. Still, UPSERT has full support for
 *inheritance* -- that just doesn't help in this case.

 
 Could you clarify as to what UPSERT's support for inheritance entails?
 

Oh, I see that this stuff has been discussed (-hackers) and written about
(wiki). I'll go read about it.

I agree with Fujii-san's concern that any UPSERT on partition limitations
given those of partitioning approach should be documented likewise, if not
under INSERT/UPSERT, then under partitioning; not really sure which.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-26 Thread Jim Nasby

On 6/24/15 1:03 PM, Peter Geoghegan wrote:

On Wed, Jun 24, 2015 at 11:02 AM, Peter Geoghegan p...@heroku.com wrote:

I think that the real way to fix this is, as you say, to make it so
that it isn't necessary in general to write trigger functions like
this to make inheritance work.


Excuse me -- I mean to make *partitioning* work.


It occurs to me that we could run the BEFORE triggers and then pass the 
new tuple to a user-supplied function that returns a regclass. That 
would allow us to know exactly which child/partition the UPSERT should 
be attempted in.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-26 Thread Simon Riggs
On 24 June 2015 at 15:05, Fujii Masao masao.fu...@gmail.com wrote:


 How should we treat this problem for 9.5? If we want to fix this problem
 completely, probably we would need to make constraint_exclusion work with
 even UPSERT. Which sounds difficult to do at least for 9.5. Any other idea?
 Or we should just treat it as a limitation of UPSERT and add that document?


+1

There are many problems that cannot be resolved for 9.5.

UPSERT works fine with tables with BRIN indexes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
http://www.2ndquadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Robert Haas
On Wed, Jun 24, 2015 at 10:29 AM, Andres Freund and...@anarazel.de wrote:
 On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
 INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
 mechanism. For example, in the following SQL commands, the last UPSERT 
 command
 would fail with an error. The error message is

 I think that's pretty much inevitable without baking in touple routing
 into the core system and supporting unique-constraints that span
 partitions. In other words, I don't think this is upsert's fault.

Is the root of the problem that the trigger is called for an INSERT ..
ON CONFLICT statement but it turns that into a plain INSERT?

Is there any way of writing a partitioning trigger that doesn't have
that defect?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Andres Freund
Hi,

On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
 INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
 mechanism. For example, in the following SQL commands, the last UPSERT command
 would fail with an error. The error message is

I think that's pretty much inevitable without baking in touple routing
into the core system and supporting unique-constraints that span
partitions. In other words, I don't think this is upsert's fault.

 Or we should just treat it as a limitation of UPSERT and add that document?

I think we should (IIRC there's actually already something) rather
document it as a limitation of the partitioning approach.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT on partition

2015-06-24 Thread Fujii Masao
Hi,

INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current partitioning
mechanism. For example, in the following SQL commands, the last UPSERT command
would fail with an error. The error message is

ERROR:  duplicate key value violates unique constraint hoge_20150601_pkey
DETAIL:  Key (col1)=(2015-06-01 10:30:00) already exists.
CONTEXT:  SQL statement INSERT INTO hoge_20150601 VALUES (($1).*)
PL/pgSQL function hoge_insert_trigger() line 6 at EXECUTE statement


CREATE TABLE hoge (col1 TIMESTAMP PRIMARY KEY, col2 INT);

CREATE OR REPLACE FUNCTION hoge_insert_trigger () RETURNS trigger AS
$$
DECLARE
part TEXT;
BEGIN
part := 'hoge_' || to_char(new.col1,
'MMDD');
EXECUTE 'INSERT INTO ' || part || '
VALUES (($1).*)' USING new;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_hoge_trigger
 BEFORE INSERT ON hoge
 FOR EACH ROW EXECUTE PROCEDURE hoge_insert_trigger();

CREATE TABLE hoge_20150601 (
LIKE hoge INCLUDING INDEXES
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS,
CHECK ('2015-06-01 00:00:00' = col1 AND col1  '2015-06-02 00:00:00')
)
INHERITS (hoge);

CREATE TABLE hoge_20150602 (
LIKE hoge INCLUDING INDEXES
INCLUDING DEFAULTS
INCLUDING CONSTRAINTS,
CHECK ('2015-06-02 00:00:00' = col1 AND col1  '2015-06-03 00:00:00')
)
INHERITS (hoge);

INSERT INTO hoge VALUES ('2015-06-01 10:30:00', 1234)
ON CONFLICT (col1) DO UPDATE SET col2 = EXCLUDED.col2;

INSERT INTO hoge VALUES ('2015-06-01 10:30:00', 1234)
ON CONFLICT (col1) DO UPDATE SET col2 = EXCLUDED.col2;


How should we treat this problem for 9.5? If we want to fix this problem
completely, probably we would need to make constraint_exclusion work with
even UPSERT. Which sounds difficult to do at least for 9.5. Any other idea?
Or we should just treat it as a limitation of UPSERT and add that document?

Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Andres Freund
On 2015-06-24 10:38:38 -0400, Robert Haas wrote:
 On Wed, Jun 24, 2015 at 10:29 AM, Andres Freund and...@anarazel.de wrote:
  On 2015-06-24 23:05:45 +0900, Fujii Masao wrote:
  INSERT ON CONFLICT DO UPDATE doesn't seem to work on the current 
  partitioning
  mechanism. For example, in the following SQL commands, the last UPSERT 
  command
  would fail with an error. The error message is
 
  I think that's pretty much inevitable without baking in touple routing
  into the core system and supporting unique-constraints that span
  partitions. In other words, I don't think this is upsert's fault.
 
 Is the root of the problem that the trigger is called for an INSERT ..
 ON CONFLICT statement but it turns that into a plain INSERT?

If you'd not do that, you'd avoid errors when violating a unique key
inside a partition, so it'd help a bit.

But it'd not do anything sane when the conflict is actually *not*
aligned with the partitioning schema, because we'll obviously only
search for conflicts within the one table.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Amit Langote

Peter,

On 2015-06-25 AM 02:35, Peter Geoghegan wrote:
 
 Inheritance with triggers is a leaky abstraction, so this kind of
 thing is always awkward. Still, UPSERT has full support for
 *inheritance* -- that just doesn't help in this case.
 

Could you clarify as to what UPSERT's support for inheritance entails?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 7:05 AM, Fujii Masao masao.fu...@gmail.com wrote:
 How should we treat this problem for 9.5? If we want to fix this problem
 completely, probably we would need to make constraint_exclusion work with
 even UPSERT. Which sounds difficult to do at least for 9.5. Any other idea?
 Or we should just treat it as a limitation of UPSERT and add that document?

I think that the real way to fix this is, as you say, to make it so
that it isn't necessary in general to write trigger functions like
this to make inheritance work. I can imagine the system rewriting an
INSERT ... ON CONFLICT DO UPDATE query to target the correct child
partition based on the values proposed for insertion, much like
constraint exclusion. There are some problems with that idea, like
what to do about BEFORE INSERT triggers changing those values (the
values that we use to determine the appropriate child partition).
These seem like problems that can be fixed.

What I think cannot work is making the trigger inheritance pattern
really robust with UPSERT. Sure, we can do a little bit, like expose
basic information about whether the INSERT is ON CONFLICT DO NOTHING
or ON CONFLICT DO UPDATE. But as you imply, it seems unwise to expect
the user to do the rewriting themselves, within their trigger
function. Maybe that's an argument against ever exposing even this
basic information to the user.

Thanks for testing.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 11:02 AM, Peter Geoghegan p...@heroku.com wrote:
 I think that the real way to fix this is, as you say, to make it so
 that it isn't necessary in general to write trigger functions like
 this to make inheritance work.

Excuse me -- I mean to make *partitioning* work.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT on partition

2015-06-24 Thread Peter Geoghegan
On Wed, Jun 24, 2015 at 7:38 AM, Robert Haas robertmh...@gmail.com wrote:
 Is the root of the problem that the trigger is called for an INSERT ..
 ON CONFLICT statement but it turns that into a plain INSERT?

 Is there any way of writing a partitioning trigger that doesn't have
 that defect?

We did discuss whether or not it was important to expose information
to triggers sufficient to route + UPSERT tuples into inheritance
children in an equivalent way (equivalent to an UPSERT into the
inheritance parent). At the time, you seemed to think that it could
wait [1].

Even if we added what was discussed as a minimal and practical way of
exposing this [2], it would still be awkward for an INSERT trigger to
examine the structure of the UPDATE part of the UPSERT -- there will
be no *tuple* for that case (yet).

Inheritance with triggers is a leaky abstraction, so this kind of
thing is always awkward. Still, UPSERT has full support for
*inheritance* -- that just doesn't help in this case.

[1] 
http://www.postgresql.org/message-id/ca+tgmozavxmwoebpk4aszonuwur3qpswwk6xetxmca+8h7c...@mail.gmail.com
[2] 
http://www.postgresql.org/message-id/cam3swzrvjyu8nnvw_jhexx4ymq9xaa7u0getlbcgym0oean...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-19 Thread Greg Stark
On Thu, Oct 16, 2014 at 8:00 PM, Peter Geoghegan p...@heroku.com wrote:
 Basically, it's difficult to make this work for technical reasons
 precisely because what I have here isn't join-like. Can I easily
 disallow OLD.* in a RETURNING clause (recall that we only project
 inserted tuples, as always)? Even if you think that's okay, I'd rather
 give an error message indicating that that makes no sense, which is
 what happens right now.

Well OLD and NEW are also not joins yet we expose them this way. It
always seemed like a hack to me but better one hack than two different
inconsistent hacks, no?

Independent of all that though. CONFLICTING() isn't clear to me --
conflicting is a relative property of two or more objects which are
both (or all) conflicting with each other. Which one does
CONFLICTING refer to here?

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-19 Thread Peter Geoghegan
On Sun, Oct 19, 2014 at 2:52 PM, Greg Stark st...@mit.edu wrote:
 Well OLD and NEW are also not joins yet we expose them this way. It
 always seemed like a hack to me but better one hack than two different
 inconsistent hacks, no?

In my opinion, no. Those hacks do not appear in the parse analysis of
ordinary optimizable statements, only utility statements concerning
rules. ChangeVarNodes() does actual rewriting of magic Vars for stored
rules during rewriting, much later. Within the parser code's
directory, only transformRuleStmt() knows about the magic RTE entries
appearing in rules (PRS2_OLD_VARNO and PRS2_NEW_VARNO), and that's in
the file parse_utilcmd.c. OLD.* and NEW.* are dynamically expanded
placeholders that only exist for the purposes of stored rules. You can
see why at the very least it's a PITA to teach more or less ordinary
parse analysis - my new invocation of transformUpdateStmt() - to care
about that. Note that there are hardly any changes to
transformUpdateStmt() in the patch, and as I've said, I like that a
lot.

Even with stored rules, we only see ChangeVarNodes() changing those
OLD.*/NEW.* placeholders (that have dummy relation RTEs) to refer to
actual, existing RTEs (I think that these are typically added by rule
invocation, but I'm not an expert on the rewriter). We're not talking
about creating a new RTE artificially during ordinary parse analysis
of conventional/optimizable statements. In the patch, parse analysis
thinks that CONFLICTING(my_var) is actually an ordinary reference to
target.my_var that happens to involve ConflictingExpr, which I think
is appropriate. Making available the post-before-insert-row-triggers
tuple is driven by new code called by ExecInsert(). The executor
drives all this. All I require from the planner is a dead simple
sequential scan based update plan, that I can use the EvalPlanQual()
infrastructure with (no more, no less), that doesn't attempt to apply
any optimization (or refer to something external) that isn't
compatible with how the plan needs to be used here.

I want to ensure that the optimizer gives me only that, and magic RTEs
that need to last until execution are bad news, in my (admittedly
non-expert) opinion.

I also happen to think that the two situations (stored rules versus
CONFLICTING()) are fundamentally dissimilar from a user's perspective,
as I've said.

 Independent of all that though. CONFLICTING() isn't clear to me --
 conflicting is a relative property of two or more objects which are
 both (or all) conflicting with each other. Which one does
 CONFLICTING refer to here?

I think you're right about that. I thought about changing the name to
EXCLUDED(), which reflects that the tuple is the actual tuple
excluded from insertion. In particular, a version that carries forward
all of the effects of before row insert triggers. What do you think of
that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-16 Thread Robert Haas
On Mon, Oct 13, 2014 at 2:02 PM, Peter Geoghegan p...@heroku.com wrote:
 If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
 before-insert triggers and then inspect the tuple to be inserted.  If
 b is neither 1 nor 2, then we'll fail with an error saying that we
 can't support ON DUPLICATE without a relevant index to enforce
 uniqueness.  (This can presumably re-use the same error message that
 would have popped out if the user done ON DUPLICATE (b), which is
 altogether un-indexed.)  But if b is 1 or 2, then we'll search the
 matching index for a conflicting tuple; finding none, we'll insert;
 finding one, we'll do the update as per the user's instructions.

 Before row insert triggers might invalidate that conclusion at the
 last possible moment. So you'd have to recheck, which is just messy.

I can't imagine that you'd decide which index to use and then change
your mind when you turn out to be wrong.  I think rather you'd compute
a list of possibly-applicable indexes based on the ON DUPLICATE column
list, and then, after firing before-insert triggers, check whether
there's one that will definitely work.  If there's a non-partial
unique index on the relevant columns, then you can put any single such
index into the list of possibly-usable indexes and leave the rest out;
otherwise, you include all the candidates and pick between them at
runtime.

If that seems too complicated, leave it out for v1: just insist that
there must be at least one unique non-partial index on the relevant
set of columns.

 I'm considering your points in this area as well as I can, but as far
 as I can tell you haven't actually set what's bad about it, just that
 it might be hazardous in some way that you don't appear to have
 specified, and that MySQL doesn't allow it.  I am untroubled by the
 latter point; it is not our goal to confine our implementation to a
 subset of MySQL.

 I did - several times. I linked to the discussion [1]. I said There
 is a trade-off here. I am willing to go another way in that trade-off,
 but let's have a realistic conversation about it. And Kevin
 eventually said of not supporting partial unique indexes: That seems
 like the only sensible course, to me. At which point I agreed to do
 it that way [2]. So you've already won this argument. All it took was
 looking at my reasons for doing things that way from my perspective.
 If there has been digging of heals, you should consider that it might
 be for a reason, even if on balance you still disagree with my
 conclusion (which was clearly not really a firm conclusion in this
 instance anyway). That's all I mean here.

There seems to be some confusion here.  This part was about this syntax:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

That's a different issue from naming indexes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-16 Thread Peter Geoghegan
On Thu, Oct 16, 2014 at 6:48 AM, Robert Haas robertmh...@gmail.com wrote:
 If that seems too complicated, leave it out for v1: just insist that
 there must be at least one unique non-partial index on the relevant
 set of columns.

That's what I'll do.

 There seems to be some confusion here.  This part was about this syntax:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 That's a different issue from naming indexes.

It is? In any case, I'm working on a revision with this syntax:

postgres=# insert into upsert values (1, 'Foo') on conflict (key)
update set val = conflicting(val);
INSERT 0 1
postgres=# insert into upsert values (1, 'Foo') on conflict (val)
update set val = conflicting(val);
ERROR:  42P10: could not infer which unique index to use from
expressions/columns provided for ON CONFLICT
LINE 1: insert into upsert values (1, 'Foo') on conflict (val) updat...
 ^
HINT:  Partial unique indexes are not supported
LOCATION:  transformConflictClause, parse_clause.c:2365

Expression indexes work fine with this syntax.

I want to retain CONFLICTING(), although I'm thinking about changing
the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
and unprecedented style of expression, and in general that's something
to be skeptical of, I think it's appropriate because what we want here
isn't quite like any existing expression. Using an alias-like syntax
is misleading, since it implies that are no effects carried from the
firing of before row insert triggers. It's also trickier to implement
alias-like referencing.

This is not a join, and I think suggesting that it is by using an
alias-like syntax to refer to excluded rows proposed for insertion
from the UPDATE is a mistake.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-16 Thread Peter Geoghegan
On Thu, Oct 16, 2014 at 11:01 AM, Peter Geoghegan p...@heroku.com wrote:
 It is? In any case, I'm working on a revision with this syntax:

By the way, in my next revision, barring any objections, the ON
CONFLICT (column/expression) syntax is mandatory in the case of ON
CONFLICT UPDATE, and optional in the case of ON CONFLICT IGNORE. If we
end up supporting exclusion constraints, it's pretty clear that
they're only useful with ON CONFLICT IGNORE anyway, and so I guess we
don't have to worry about naming those. I guess there would be some
advantage to naming an exclusion constraint directly even for the
IGNORE case (which is another reason I considered naming an index
directly), but it doesn't seem that important.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-16 Thread Robert Haas
On Thu, Oct 16, 2014 at 2:01 PM, Peter Geoghegan p...@heroku.com wrote:
 I want to retain CONFLICTING(), although I'm thinking about changing
 the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
 and unprecedented style of expression, and in general that's something
 to be skeptical of, I think it's appropriate because what we want here
 isn't quite like any existing expression. Using an alias-like syntax
 is misleading, since it implies that are no effects carried from the
 firing of before row insert triggers. It's also trickier to implement
 alias-like referencing.

I think that the general consensus was against that style.  I don't
like it, and IIRC a few other people commented as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-16 Thread Peter Geoghegan
On Thu, Oct 16, 2014 at 11:39 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 16, 2014 at 2:01 PM, Peter Geoghegan p...@heroku.com wrote:
 I want to retain CONFLICTING(), although I'm thinking about changing
 the spelling to EXCLUDED(). While CONFLICTING() is more or less a new
 and unprecedented style of expression, and in general that's something
 to be skeptical of, I think it's appropriate because what we want here
 isn't quite like any existing expression. Using an alias-like syntax
 is misleading, since it implies that are no effects carried from the
 firing of before row insert triggers. It's also trickier to implement
 alias-like referencing.

 I think that the general consensus was against that style.  I don't
 like it, and IIRC a few other people commented as well.

I think that's accurate, but I'd ask those that didn't like the style
to reconsider. Also, I am willing to use any type of special
expression, and any keyword or keywords. It doesn't have to be
function-like at all. But I believe that an alias-like syntax presents
certain difficulties.

There is the fact that this isn't a join, and so misleads users, which
I've explained. But if I have to add a range table entry for the alias
to make parse analysis of the UPDATE work in the more or less
conventional way (which is something I like a lot about the current
design), then that creates considerable headaches. I have to
discriminate against those in the optimizer, when time comes to
disallow params in the auxiliary plan. I have to think about each case
then, and I think that could add a lot more complexity than you'd
think. I'm not really sure.

Basically, it's difficult to make this work for technical reasons
precisely because what I have here isn't join-like. Can I easily
disallow OLD.* in a RETURNING clause (recall that we only project
inserted tuples, as always)? Even if you think that's okay, I'd rather
give an error message indicating that that makes no sense, which is
what happens right now.

Besides all that, there may also be an advantage to having something
similar to MySQL.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-13 Thread Robert Haas
On Fri, Oct 10, 2014 at 4:41 PM, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote:
 I think what's realistic here is that the patch isn't going to get
 committed the way you have it today, because nobody else likes that
 design.  That may be harsh, but I think it's accurate.

 I do think that's harsh, because it's unnecessary: I am looking for
 the best design. If you want to propose alternatives, great, but there
 is a reason why I've done things that way, and that should be
 acknowledged. I too think naming the unique index is ugly as sin, and
 have said as much multiple times. We're almost on the same page here.

Come on.  You've had the syntax that way for a while, multiple people
have objected to it, and it didn't get changed.  When people renewed
their objections, you said that they were not having a realistic
conversation.  I'm tired of getting critiqued because there's some
fine point of one of the scores of emails you've sent on this topic
that you feel I haven't adequately appreciated.  I would like to avoid
getting into a flame-war here where we spend a lot of time arguing
about who should have said what in exactly what way, but I think it is
fair to say that your emails have come across to me, and to a number
of other people here, as digging in your heels and refusing to budge.
I would go so far as to say that said perception is the primary reason
why this patch is making so little progress, far outstripping whatever
the actual technical issues are.

 Whatever the actual behavior
 of your patch is today, it seems to me that the conflict is,
 fundamentally, over a set of column values, NOT over a particular
 index.  The index is simply a mechanism for noticing that conflict.

 I think that this is the kernel of our disagreement. The index is not
 simply a mechanism for noticing that conflict. The user had better
 have one unique index in mind to provoke taking the alternative path
 in the event of a would-be unique violation. Clearly it doesn't matter
 much in this particular example. But what if there were two partial
 unique indexes, that were actually distinct, but only in terms of the
 constant appearing in their predicate? And what about having that
 changed by a before insert row-level trigger? Are we to defer deciding
 which unique index to use at the last possible minute?

I don't immediately see why this would cause a problem.  IIUC, the
scenario we're talking about is:

create table foo (a int, b int);
create unique index foo1 on foo (a) where b = 1;
create unique index foo2 on foo (a) where b = 2;

If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
before-insert triggers and then inspect the tuple to be inserted.  If
b is neither 1 nor 2, then we'll fail with an error saying that we
can't support ON DUPLICATE without a relevant index to enforce
uniqueness.  (This can presumably re-use the same error message that
would have popped out if the user done ON DUPLICATE (b), which is
altogether un-indexed.)  But if b is 1 or 2, then we'll search the
matching index for a conflicting tuple; finding none, we'll insert;
finding one, we'll do the update as per the user's instructions.

 As always with this stuff, the devil is in the details. If we work out
 the details, then I can come up with something that's acceptable to
 everyone. Would you be okay with this never working with partial
 unique indexes? That gives me something to work with.

If it can't be made to work with them in a reasonable fashion, I would
probably be OK with that, but so far I see no reason for such
pessimism.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 Your syntax allows the exact same thing; it simply require the user to
 be more verbose in order to get that behavior.  If we think that
 people wanting that behavior will be rare, then it's fine to require
 them to spell it out when they want it.  If we think it will be the
 overwhelming common application of this feature, and I do, then making
 people spell it out when we could just as well infer it is pointless.

 Did you consider my example? I think that people will like this idea,
 too - that clearly isn't the only consideration, though. As you say,
 it would be very easy to implement this. However, IMV, we shouldn't,
 because it is hazardous. MySQL doesn't allow this, and they tend to
 find expedients like this useful.

I'm considering your points in this area as well as I can, but as far
as I can tell you haven't actually set what's bad about it, just that
it might be hazardous in some way that you don't appear to have
specified, and that MySQL doesn't allow it.  I am untroubled by the
latter point; it is not our goal to confine our implementation to a
subset of 

Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-13 Thread Peter Geoghegan
On Mon, Oct 13, 2014 at 7:46 AM, Robert Haas robertmh...@gmail.com wrote:
 Come on.  You've had the syntax that way for a while, multiple people
 have objected to it, and it didn't get changed.  When people renewed
 their objections, you said that they were not having a realistic
 conversation.  I'm tired of getting critiqued because there's some
 fine point of one of the scores of emails you've sent on this topic
 that you feel I haven't adequately appreciated.

We're in the fine point business. Obviously the issues with partial
unique indexes *are* pretty esoteric. But worrying about these edge
cases are the kind of thing that will get us an implementation of high
enough quality to commit. They're esoteric until they affect you.

 I would like to avoid
 getting into a flame-war here where we spend a lot of time arguing
 about who should have said what in exactly what way, but I think it is
 fair to say that your emails have come across to me, and to a number
 of other people here, as digging in your heels and refusing to budge.

I am not refusing to budge (in point of fact, on this question I have
already budged; see my remarks below). I am saying: Hey, there is a
reason why you're getting push back here. The reason isn't that I like
WITHIN unique_index_name so much - obviously I don't. Who could?

 I don't immediately see why this would cause a problem.  IIUC, the
 scenario we're talking about is:

 create table foo (a int, b int);
 create unique index foo1 on foo (a) where b = 1;
 create unique index foo2 on foo (a) where b = 2;

 If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
 before-insert triggers and then inspect the tuple to be inserted.  If
 b is neither 1 nor 2, then we'll fail with an error saying that we
 can't support ON DUPLICATE without a relevant index to enforce
 uniqueness.  (This can presumably re-use the same error message that
 would have popped out if the user done ON DUPLICATE (b), which is
 altogether un-indexed.)  But if b is 1 or 2, then we'll search the
 matching index for a conflicting tuple; finding none, we'll insert;
 finding one, we'll do the update as per the user's instructions.

Before row insert triggers might invalidate that conclusion at the
last possible moment. So you'd have to recheck, which is just messy.

 As always with this stuff, the devil is in the details. If we work out
 the details, then I can come up with something that's acceptable to
 everyone. Would you be okay with this never working with partial
 unique indexes? That gives me something to work with.

 If it can't be made to work with them in a reasonable fashion, I would
 probably be OK with that, but so far I see no reason for such
 pessimism.

At the very least I think that it would be a bad use of our resources
to bend over backwards to make this work for partial unique indexes,
which is what it would take. So I will proceed with the basic idea of
naming columns, while not having that ever resolve partial unique
indexes.

 I'm considering your points in this area as well as I can, but as far
 as I can tell you haven't actually set what's bad about it, just that
 it might be hazardous in some way that you don't appear to have
 specified, and that MySQL doesn't allow it.  I am untroubled by the
 latter point; it is not our goal to confine our implementation to a
 subset of MySQL.

I did - several times. I linked to the discussion [1]. I said There
is a trade-off here. I am willing to go another way in that trade-off,
but let's have a realistic conversation about it. And Kevin
eventually said of not supporting partial unique indexes: That seems
like the only sensible course, to me. At which point I agreed to do
it that way [2]. So you've already won this argument. All it took was
looking at my reasons for doing things that way from my perspective.
If there has been digging of heals, you should consider that it might
be for a reason, even if on balance you still disagree with my
conclusion (which was clearly not really a firm conclusion in this
instance anyway). That's all I mean here.

I have been successful at talking you out of various approaches to
this problem multiple times. This is not simple intransigence.

[1] 
http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93kxeh...@mail.gmail.com

[2] 
http://www.postgresql.org/message-id/cam3swzstdchn6-iejbc20ogd8twmz6-um6o8gz2bofzxn9y...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-12 Thread Matthew Woodcraft
On 2014-10-10 19:44, Kevin Grittner wrote:
 Peter Geoghegan p...@heroku.com wrote:
 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, 

[...]

 To restate: to do so is conflating the logical definition of the 
 database with a particular implementation detail.  As just one 
 reason that is a bad idea: we can look up unique indexes on the 
 specified columns, but if we implement a other storage techniques 
 where there is no such thing as a unique index on the columns, yet 
 manage to duplicate the semantics (yes, stranger things have 
 happened), people can't migrate to the new structure without 
 rewriting their queries

Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.

-M-




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-12 Thread Marko Tiikkaja

On 10/12/14, 2:36 PM, Matthew Woodcraft wrote:

On 2014-10-10 19:44, Kevin Grittner wrote:

To restate: to do so is conflating the logical definition of the
database with a particular implementation detail.  As just one
reason that is a bad idea: we can look up unique indexes on the
specified columns, but if we implement a other storage techniques
where there is no such thing as a unique index on the columns, yet
manage to duplicate the semantics (yes, stranger things have
happened), people can't migrate to the new structure without
rewriting their queries


Wouldn't it be good enough to define the 'WITHIN' as expecting a
unique-constraint name rather than an index name (even though those
happen to be the same strings)?

I think constraints are part of the logical definition of the database,
and a new storage technique which doesn't use indexes should still have
names for its unique constraints.


What about partial indexes?  Indexes on expressions or functions calls?


.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas
On Wed, Oct 8, 2014 at 5:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Peter Geoghegan p...@heroku.com wrote:
 On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

 Can you please comment on this, Kevin? It would be nice to converge on
 an agreement on syntax here

 Robert said however spelled -- which I take to mean that he at
 least sees that the MERGE-like UPSERT syntax can be turned into the
 desired semantics.  I have no idea why anyone would think otherwise.

 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.

OK, I have a comment on this.

Anything we do about triggers will by definition be novel.  Right now,
we have INSERT, UPDATE, and DELETE.  If we add a new operation,
whether it's called UPSERT or MERGE or FROB, or if we add a flag to
insert that makes it possibly do something other than inserting (e.g.
INSERT OR UPDATE), or if we add a flag to update that makes it do
something other than updating (e.g. UPDATE or INSERT), then some
people's triggers are going to get broken by that change no matter how
we do it.  When the new operation is invoked, we can fire the insert
triggers, the update triggers, some new kind of trigger, or no trigger
at all - and any decision we make there will not in all cases be
backward-compatible.  We can try to minimize the damage (and we
probably should) but we can't make it go to zero.

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON CONFLICT WITHIN targettable_pkey
 UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
 now();

I actually like this syntax reasonably well in some ways, but I don't
like that we're mentioning the index name, and the CONFLICTING()
notation is decidedly odd.  Maybe something like this:

INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON DUPLICATE (key)
 UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

(Perhaps OLD should reference the tuple already in the table, and NEW
the value from the VALUES clause.  That would be snazzy indeed.)

Also, how about making the SET clause optional, with the semantics
that we just update all of the fields for which a value is explicitly
specified:

INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;

While the ability to specify a SET clause there explicitly is useful,
I bet a lot of key-value store users would love the heck out of a
syntax that let them omit it when they want to overwrite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:
 Anything we do about triggers will by definition be novel.  Right now,
 we have INSERT, UPDATE, and DELETE.  If we add a new operation,
 whether it's called UPSERT or MERGE or FROB, or if we add a flag to
 insert that makes it possibly do something other than inserting (e.g.
 INSERT OR UPDATE), or if we add a flag to update that makes it do
 something other than updating (e.g. UPDATE or INSERT), then some
 people's triggers are going to get broken by that change no matter how
 we do it.  When the new operation is invoked, we can fire the insert
 triggers, the update triggers, some new kind of trigger, or no trigger
 at all - and any decision we make there will not in all cases be
 backward-compatible.  We can try to minimize the damage (and we
 probably should) but we can't make it go to zero.

+1. It's inevitable that someone isn't going to be entirely happy with
whatever we do. Let's be realistic about that, while minimizing the
risk.

 I actually like this syntax reasonably well in some ways, but I don't
 like that we're mentioning the index name, and the CONFLICTING()
 notation is decidedly odd.  Maybe something like this:

People keep remarking that they don't like that you can (optionally)
name a unique index explicitly, and I keep telling them why I've done
it that way [1]. There is a trade-off here. I am willing to go another
way in that trade-off, but let's have a realistic conversation about
it.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 While the ability to specify a SET clause there explicitly is useful,
 I bet a lot of key-value store users would love the heck out of a
 syntax that let them omit it when they want to overwrite.

While I initially like that idea, now I'm not so sure about it [2].
MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
command allows it, since it is defined as a DELETE followed by an
INSERT (or something like that), but I think that in general that
feature has been a disaster for them.

[1] 
http://www.postgresql.org/message-id/CAM3SWZQpGf6_ME6D1vWqdCFadD7Nprdx2JrE=Hcf=93kxeh...@mail.gmail.com

[2] 
http://www.postgresql.org/message-id/CAM3SWZT=vxbj7qkaidamybu40ap10udsqooqhvix3ykj7wb...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 Anything we do about triggers will by definition be novel.  Right
 now, we have INSERT, UPDATE, and DELETE.  If we add a new
 operation, whether it's called UPSERT or MERGE or FROB, [ ... ]

If we call it MERGE, then we had better follow the rules the
standard lays out for triggers on a MERGE statement -- which is
that UPDATE triggers fire for rows updated and that INSERT triggers
fire for rows inserted.  That kinda makes sense, since the MERGE
command is almost like a row-by-row CASE statement allowing one or
the other.

If we make up our own command verb, we are free to do as we like.

 Maybe something like this:

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON DUPLICATE (key)
 UPDATE SET quantity = quantity + OLD.quantity, updated_at = now();

That seems a lot cleaner than the proposal on the Wiki page.  If we
go that route, it makes sense to fire the BEFORE INSERT triggers
before attempting the insert and then fire BEFORE UPDATE triggers
before attempting the UPDATE.  If either succeeds, I think we
should fire the corresponding AFTER triggers.  We already allow a
BEFORE triggers to run and then omit the triggering operation
without an error, so I don't see that as a problem.  This makes a
lot more sense to me than attempting to add a new UPSERT trigger
type.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote:
 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.  If either succeeds, I think we
 should fire the corresponding AFTER triggers.  We already allow a
 BEFORE triggers to run and then omit the triggering operation
 without an error, so I don't see that as a problem.  This makes a
 lot more sense to me than attempting to add a new UPSERT trigger
 type.

You realize that that's exactly what my patch does, right?

AFAICT the only confusion that my patch has is with statement-level
triggers, as discussed on the Wiki page. I think that the row-level
trigger behavior is fine; a lot more thought has gone into it.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:05 AM, Robert Haas robertmh...@gmail.com wrote:

 I actually like this syntax reasonably well in some ways, but I don't
 like that we're mentioning the index name, and the CONFLICTING()
 notation is decidedly odd.

 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, and I keep telling them why I've done
 it that way [1]. There is a trade-off here. I am willing to go another
 way in that trade-off, but let's have a realistic conversation about
 it.

We've all read that, and your repeated arguments for that point of
view.  We disagree and have said why.  What in that is not a
realistic conversation?

To restate: to do so is conflating the logical definition of the 
database with a particular implementation detail.  As just one 
reason that is a bad idea: we can look up unique indexes on the 
specified columns, but if we implement a other storage techniques 
where there is no such thing as a unique index on the columns, yet 
manage to duplicate the semantics (yes, stranger things have 
happened), people can't migrate to the new structure without 
rewriting their queries.  If the syntax references logical details 
(like column names) there is no need to rewrite.  We don't want to 
be painted into a corner.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:38 AM, Peter Geoghegan p...@heroku.com wrote:
 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.

By the way, there is no problem with failing to UPDATE, because we
lock rows ahead of the UPDATE. Once a row is locked, the UPDATE cannot
conflict. There is no danger of UPDATE before row-level triggers
firing without then updating (unless the xact aborts, but you know
what I mean). In general, there is no danger of triggers firing more
often than you might consider that they should, with the sole
exception of the fact that we always fire before insert row-level
triggers, even when an UPDATE is the ultimate outcome.

Restarting for conflicts (e.g. handling concurrent insertions with
promise tuples) necessitates a restart, but to the point after before
row-level insert triggers fire, and from a point before any other
triggers have the opportunity to fire.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote:
 We've all read that, and your repeated arguments for that point of
 view.  We disagree and have said why.  What in that is not a
 realistic conversation?

Because, as I've said many times, there are problems with naming
columns directly that need to be addressed. There are corner-cases.
Are you going to complain about those once I go implement something?
Let's talk about that.

I think we can just throw an error in these edge-cases, but let's
decide if we're comfortable with that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:30 AM, Kevin Grittner kgri...@ymail.com wrote:

 That seems a lot cleaner than the proposal on the Wiki page.  If we
 go that route, it makes sense to fire the BEFORE INSERT triggers
 before attempting the insert and then fire BEFORE UPDATE triggers
 before attempting the UPDATE.  If either succeeds, I think we
 should fire the corresponding AFTER triggers.  We already allow a
 BEFORE triggers to run and then omit the triggering operation
 without an error, so I don't see that as a problem.  This makes a
 lot more sense to me than attempting to add a new UPSERT trigger
 type.

 You realize that that's exactly what my patch does, right?

I haven't read the patch, but the discussion has made that clear,
yes.  Try to take agreement on a point gracefully, will ya?  ;-)

 AFAICT the only confusion that my patch has is with statement-level
 triggers, as discussed on the Wiki page. I think that the row-level
 trigger behavior is fine; a lot more thought has gone into it.

IMV it is clear that since the standard says that update or insert
operations which affect zero rows fire the corresponding trigger,
the statement level triggers for both should fire for an UPSERT,
even if the set of affected rows for one or the other (or both!) is
zero.  If we ever get transition relations, it will be easy to
check the count of affected rows or otherwise behave appropriately
based on what was done.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:

 There is no danger of UPDATE before row-level triggers firing
 without then updating (unless the xact aborts, but you know what
 I mean).

Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
without error, right?  The BEFORE UPDATE triggers will run before
the UPDATE if a duplicate is found, and the return value will be
treated in the usual manner, replacing values and potentially
skipping the update?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Robert Haas
On Fri, Oct 10, 2014 at 2:29 PM, Peter Geoghegan p...@heroku.com wrote:
 People keep remarking that they don't like that you can (optionally)
 name a unique index explicitly, and I keep telling them why I've done
 it that way [1]. There is a trade-off here. I am willing to go another
 way in that trade-off, but let's have a realistic conversation about
 it.

I think what's realistic here is that the patch isn't going to get
committed the way you have it today, because nobody else likes that
design.  That may be harsh, but I think it's accurate.

But on the substance of the issue, I'm totally unconvinced by your
argument in the linked email.  Let's suppose you have this:

create table foo (a int, b int);
create unique index foo1 on foo (a);
create unique index foo2 on foo (a);
create unique index foo3 on foo (a) where b = 1;

Anything that conflicts in foo1 or foo2 is going to conflict in the
other one, and with foo3.  Anything that conflicts in foo3 is perhaps
also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
what to do: when we hit a conflict in any of those three indexes,
update the row.  No matter which index has the conflict, updating is
the right answer, and solves the conflict in all three indexes.  I
dunno what you'd expect someone to write in your syntax to solve this
problem - presumably either (1) they can list any of the indexes that
might conflict, (2) they must list all of the indexes that might
conflict, or (3) it just doesn't work.  Whatever the actual behavior
of your patch is today, it seems to me that the conflict is,
fundamentally, over a set of column values, NOT over a particular
index.  The index is simply a mechanism for noticing that conflict.

If you want to allow this to work with expression indexes, that's not
really a problem; you can let the user write INSERT .. ON CONFLICT
(upper(foo)) UPDATE ... and match that to the index.  It wouldn't
bother me if the first version of this feature couldn't target
expression indexes anyway, but if you want to include that, having the
user mention the actual expression rather than the index name
shouldn't make much difference.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 While the ability to specify a SET clause there explicitly is useful,
 I bet a lot of key-value store users would love the heck out of a
 syntax that let them omit it when they want to overwrite.

 While I initially like that idea, now I'm not so sure about it [2].
 MySQL don't allow this (with ON DUPLICATE KEY UPDATE). Their REPLACE
 command allows it, since it is defined as a DELETE followed by an
 INSERT (or something like that), but I think that in general that
 feature has been a disaster for them.

Your syntax allows the exact same thing; it simply require the user to
be more verbose in order to get that behavior.  If we think that
people wanting that behavior will be rare, then it's fine to require
them to spell it out when they want it.  If we think it will be the
overwhelming common application of this feature, and I do, then making
people spell it out when we could just as well infer it is pointless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 11:44 AM, Kevin Grittner kgri...@ymail.com wrote:

 [discussion on committers rejecting the notion of a syntax 
  involving specification of an index name]

 as I've said many times, there are problems with naming
 columns directly that need to be addressed. There are corner-cases.
 Are you going to complain about those once I go implement something?

If I don't like what I see, yes.  I think it will be entirely
possible for you to write something along those lines that I won't
complain about.

 I think we can just throw an error in these edge-cases, but let's
 decide if we're comfortable with that.

If the list of columns does not allow a choice of a suitable
full-table unique index on only non-null columns, an error seems
appropriate.  What other problem cases do you see?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 11:55 AM, Kevin Grittner kgri...@ymail.com wrote:
 I haven't read the patch, but the discussion has made that clear,
 yes.  Try to take agreement on a point gracefully, will ya?  ;-)

Heh, sorry. I did literally mean what I said - it wasn't 100% clear to
me that you got that. It's safest to not make any assumptions like
that.

 AFAICT the only confusion that my patch has is with statement-level
 triggers, as discussed on the Wiki page. I think that the row-level
 trigger behavior is fine; a lot more thought has gone into it.

 IMV it is clear that since the standard says that update or insert
 operations which affect zero rows fire the corresponding trigger,
 the statement level triggers for both should fire for an UPSERT,
 even if the set of affected rows for one or the other (or both!) is
 zero.  If we ever get transition relations, it will be easy to
 check the count of affected rows or otherwise behave appropriately
 based on what was done.

I think that you're probably right about that. Note that UPDATE rules
will not be applied to the UPDATE portion of the statement. Not sure
what to do about that, but I'm pretty sure that it's inevitable,
because the UPDATE is effectively the same top-level statement for the
purposes of rules.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 12:04 PM, Kevin Grittner kgri...@ymail.com wrote:
 Peter Geoghegan p...@heroku.com wrote:

 There is no danger of UPDATE before row-level triggers firing
 without then updating (unless the xact aborts, but you know what
 I mean).

 Well, let's make sure I do know what you mean.  If a BEFORE UPDATE
 ... FOR EACH ROW trigger returns NULL, the UPDATE will be skipped
 without error, right?  The BEFORE UPDATE triggers will run before
 the UPDATE if a duplicate is found, and the return value will be
 treated in the usual manner, replacing values and potentially
 skipping the update?

That's exactly right, yes (in particular, you can return NULL from
before row update triggers and have that cancel the update in the
usual manner). And potentially, the final value could be affected by
both the before row insert and before row update triggers (by way of
CONFLICTING()). That's the most surprising aspect of what I've done
(although I would argue it's the least surprising behavior possible
given my constraints).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote:
 I think what's realistic here is that the patch isn't going to get
 committed the way you have it today, because nobody else likes that
 design.  That may be harsh, but I think it's accurate.

I do think that's harsh, because it's unnecessary: I am looking for
the best design. If you want to propose alternatives, great, but there
is a reason why I've done things that way, and that should be
acknowledged. I too think naming the unique index is ugly as sin, and
have said as much multiple times. We're almost on the same page here.

 But on the substance of the issue, I'm totally unconvinced by your
 argument in the linked email.  Let's suppose you have this:

 create table foo (a int, b int);
 create unique index foo1 on foo (a);
 create unique index foo2 on foo (a);
 create unique index foo3 on foo (a) where b = 1;

 Anything that conflicts in foo1 or foo2 is going to conflict in the
 other one, and with foo3.  Anything that conflicts in foo3 is perhaps
 also going to conflict in foo1 and foo2, or perhaps not.  Yet, if the
 statement simply says ON DUPLICATE (a) UPDATE, it's entirely clear
 what to do: when we hit a conflict in any of those three indexes,
 update the row.  No matter which index has the conflict, updating is
 the right answer, and solves the conflict in all three indexes.  I
 dunno what you'd expect someone to write in your syntax to solve this
 problem - presumably either (1) they can list any of the indexes that
 might conflict, (2) they must list all of the indexes that might
 conflict, or (3) it just doesn't work.

I expect them to name exactly one unique index. They should either do
that explicitly, or have one in mind and make the behavior implicit at
the risk of miscalculating (and having a surprising outcome). It
doesn't matter if it's foo1 or foo2 in this example (but foo3 is
different, obviously).

 Whatever the actual behavior
 of your patch is today, it seems to me that the conflict is,
 fundamentally, over a set of column values, NOT over a particular
 index.  The index is simply a mechanism for noticing that conflict.

I think that this is the kernel of our disagreement. The index is not
simply a mechanism for noticing that conflict. The user had better
have one unique index in mind to provoke taking the alternative path
in the event of a would-be unique violation. Clearly it doesn't matter
much in this particular example. But what if there were two partial
unique indexes, that were actually distinct, but only in terms of the
constant appearing in their predicate? And what about having that
changed by a before insert row-level trigger? Are we to defer deciding
which unique index to use at the last possible minute?

As always with this stuff, the devil is in the details. If we work out
the details, then I can come up with something that's acceptable to
everyone. Would you be okay with this never working with partial
unique indexes? That gives me something to work with.

 If you want to allow this to work with expression indexes, that's not
 really a problem; you can let the user write INSERT .. ON CONFLICT
 (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
 bother me if the first version of this feature couldn't target
 expression indexes anyway, but if you want to include that, having the
 user mention the actual expression rather than the index name
 shouldn't make much difference.

I'm not that worried about expression indexes, actually. I'm mostly
worried about partial unique indexes, particularly when before insert
row-level triggers are in play (making *that* play nice with
expression indexes is harder still, but expression indexes on their
own are probably not that much of a problem).

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
 VALUES (42, 'meaning of life')
 ON DUPLICATE (key) UPDATE;

 Your syntax allows the exact same thing; it simply require the user to
 be more verbose in order to get that behavior.  If we think that
 people wanting that behavior will be rare, then it's fine to require
 them to spell it out when they want it.  If we think it will be the
 overwhelming common application of this feature, and I do, then making
 people spell it out when we could just as well infer it is pointless.

Did you consider my example? I think that people will like this idea,
too - that clearly isn't the only consideration, though. As you say,
it would be very easy to implement this. However, IMV, we shouldn't,
because it is hazardous. MySQL doesn't allow this, and they tend to
find expedients like this useful.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Jim Nasby

On 10/9/14, 6:59 PM, Gavin Flower wrote:

On 10/10/14 12:38, Jim Nasby wrote:

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that turned 
out to be an update could be rolled back... but the performance impact of going 
that route might be pretty horrid. :( There's also the potential to get stuck 
in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a 
BEFORE UPDATE trigger turns it into an INSERT.

Perhaps you need an UPSERT trigger?


I would think that a BEFORE UPSERT trigger would very likely want to know 
whether we were inserting or updating, which basically puts us back where we 
started.

That said, since the use case for UPSERT is different than both INSERT and 
UPDATE maybe it would be a good idea to have a separate trigger for them anyway.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas robertmh...@gmail.com wrote:
 I think what's realistic here is that the patch isn't going to get
 committed the way you have it today, because nobody else likes that
 design.  That may be harsh, but I think it's accurate.

 I do think that's harsh, because it's unnecessary: I am looking for
 the best design. If you want to propose alternatives, great, but there
 is a reason why I've done things that way, and that should be
 acknowledged. I too think naming the unique index is ugly as sin, and
 have said as much multiple times. We're almost on the same page here.

I hope you can adjust to the feedback, because it would be
disappointing to have this feature slip from the release, which is
what will happen if the index name remains part of the syntax.

 Would you be okay with this never working with partial unique
 indexes? That gives me something to work with.

That seems like the only sensible course, to me.

 If you want to allow this to work with expression indexes, that's not
 really a problem; you can let the user write INSERT .. ON CONFLICT
 (upper(foo)) UPDATE ... and match that to the index.  It wouldn't
 bother me if the first version of this feature couldn't target
 expression indexes anyway, but if you want to include that, having the
 user mention the actual expression rather than the index name
 shouldn't make much difference.

 I'm not that worried about expression indexes, actually. I'm mostly
 worried about partial unique indexes, particularly when before insert
 row-level triggers are in play (making *that* play nice with
 expression indexes is harder still, but expression indexes on their
 own are probably not that much of a problem).

There is a problem if any column of the index allows nulls, since
that would allow multiple rows in the target table to match an
argument. Proving that an expression does not could be tricky.  I
suggest that, at least for a first cut, we restrict this to
matching an index on NOT NULL columns.

 Also, how about making the SET clause optional, with the semantics
 that we just update all of the fields for which a value is explicitly
 specified:

 INSERT INTO overwrite_with_abandon (key, value)
VALUES (42, 'meaning of life')
ON DUPLICATE (key) UPDATE;

 Your syntax allows the exact same thing; it simply require the user to
 be more verbose in order to get that behavior.  If we think that
 people wanting that behavior will be rare, then it's fine to require
 them to spell it out when they want it.  If we think it will be the
 overwhelming common application of this feature, and I do, then making
 people spell it out when we could just as well infer it is pointless.

+1

 Did you consider my example? I think that people will like this idea,
 too - that clearly isn't the only consideration, though. As you say,
 it would be very easy to implement this. However, IMV, we shouldn't,
 because it is hazardous.

To quote the cited case:

 1. Developer writes the query, and it works fine.

 2. Some time later, the DBA adds an inserted_at column (those are
 common). The DBA is not aware of the existence of this particular
 query. The new column has a default value of now(), say.

 Should we UPDATE the inserted_at column to be NULL? Or (more
 plausibly) the default value filled in by the INSERT? Or leave it be?
 I think there is a case to be made for all of these behaviors, and
 that tension makes me prefer to not do this at all. It's like
 encouraging SELECT * queries in production, only worse.

That does point out the problem, in general, with carrying values
from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
fails the UPDATE phase should start with the values specified in
the first place, and not try to use anything returned by the BEFORE
INSERT triggers?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 2:17 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 That said, since the use case for UPSERT is different than both INSERT and
 UPDATE maybe it would be a good idea to have a separate trigger for them
 anyway.

-1. I think that having a separate UPSERT trigger is a bad idea. I'll
need to change the statement-level behavior to match what Kevin
described (fires twice, always, when INSERT ON CONFLICT UPDATE is
used).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-10 Thread Peter Geoghegan
On Fri, Oct 10, 2014 at 2:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 Would you be okay with this never working with partial unique
 indexes? That gives me something to work with.

 That seems like the only sensible course, to me.

Okay, great. If that's the consensus, then that's all I need to know.

 I'm not that worried about expression indexes, actually. I'm mostly
 worried about partial unique indexes, particularly when before insert
 row-level triggers are in play

 There is a problem if any column of the index allows nulls, since
 that would allow multiple rows in the target table to match an
 argument. Proving that an expression does not could be tricky.  I
 suggest that, at least for a first cut, we restrict this to
 matching an index on NOT NULL columns.

Hmm. That could definitely be a problem. But the problem is clearly
the user's fault. It could allow multiple rows to match, as you say
- but you're talking about a different concept of matching. Unlike
with certain other systems, that isn't a would-be unique violation,
and so it isn't a match in the sense I intend, and so you insert.
You don't match more than one row, because (in the sense peculiar to
this feature) there are no matches, and insertion really is the
correct outcome.

As a user, you have no more right to be surprised by this then by the
fact that you wouldn't see a dup violation with a similar vanilla
INSERT - some people are surprised by that, with regularity.

Maybe I need to emphasize the distinction in the documentation between
the two slightly different ideas of matching that are in tension
here. I think it would be a mistake to forcibly normalize things to
remove that distinction (by adding an artificial restriction), at any
rate.

 Did you consider my example? I think that people will like this idea,
 too - that clearly isn't the only consideration, though. As you say,
 it would be very easy to implement this. However, IMV, we shouldn't,
 because it is hazardous.

 To quote the cited case:

 1. Developer writes the query, and it works fine.

 2. Some time later, the DBA adds an inserted_at column (those are
 common). The DBA is not aware of the existence of this particular
 query. The new column has a default value of now(), say.

 Should we UPDATE the inserted_at column to be NULL? Or (more
 plausibly) the default value filled in by the INSERT? Or leave it be?
 I think there is a case to be made for all of these behaviors, and
 that tension makes me prefer to not do this at all. It's like
 encouraging SELECT * queries in production, only worse.

 That does point out the problem, in general, with carrying values
 from a BEFORE INSERT trigger into an UPDATE.  Perhaps if the INSERT
 fails the UPDATE phase should start with the values specified in
 the first place, and not try to use anything returned by the BEFORE
 INSERT triggers?

I think that that behavior is far more problematic for numerous other
reasons, though. Potentially, you're not seeing what *really* caused
the conflict at all. I just don't think it's worth it to save a bit of
typing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-09 Thread Jim Nasby

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:

On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com  wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that turned 
out to be an update could be rolled back... but the performance impact of going 
that route might be pretty horrid. :( There's also the potential to get stuck 
in a loop where a BEFORE INSERT trigger turns the tuple into an UPDATE and a 
BEFORE UPDATE trigger turns it into an INSERT.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-09 Thread Gavin Flower

On 10/10/14 12:38, Jim Nasby wrote:

On 10/8/14, 5:51 PM, Peter Geoghegan wrote:
On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittnerkgri...@ymail.com  
wrote:

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.


FWIW, if each row was handled in a subtransaction then an insert that 
turned out to be an update could be rolled back... but the performance 
impact of going that route might be pretty horrid. :( There's also the 
potential to get stuck in a loop where a BEFORE INSERT trigger turns 
the tuple into an UPDATE and a BEFORE UPDATE trigger turns it into an 
INSERT.

Perhaps you need an UPSERT trigger?



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

Can you please comment on this, Kevin? It would be nice to converge on
an agreement on syntax here (without necessarily working out the exact
details right away, including for example the exact spelling of what
I'm calling WITHIN). Since Simon has changed his mind on MERGE (while
still wanting to retain a superficial flavor of MERGE,  a flavor that
does not include the MERGE keyword), I think that leaves you as the
only advocate for the MERGE syntax.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:

 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).  You may find that there's a tuple
 with the same PK which is committed but not visible to the snapshot
 you took at the beginning of the statement.

 Can you please comment on this, Kevin? It would be nice to converge on
 an agreement on syntax here

Robert said however spelled -- which I take to mean that he at
least sees that the MERGE-like UPSERT syntax can be turned into the
desired semantics.  I have no idea why anyone would think otherwise.

Although the last go-around does suggest that there is at least one
point of difference on the semantics.  You seem to want to fire the
BEFORE INSERT triggers before determining whether this will be an
INSERT or an UPDATE.  That seems like a bad idea to me, but if the
consensus is that we want to do that, it does argue for your plan
of making UPSERT a variant of the INSERT command.  That doesn't
seem as clean to me as saying (for example):

UPSERT targettable t
  USING (VALUES (123, 100)) x(id, quantity)
  ON t.id = x.id
  WHEN NOT MATCHED
INSERT (id, quantity, inserted_at) VALUES (x.id, x.quantity, now())
  WHEN MATCHED
UPDATE SET quantity = t.quantity + x.quantity, updated_at = now();

As I understand it you are proposing that would be:

INSERT INTO targettable(key, quantity, inserted_at)
  VALUES(123, quantity, now())
  ON CONFLICT WITHIN targettable_pkey
UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = now();

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me, but if the
 consensus is that we want to do that, it does argue for your plan
 of making UPSERT a variant of the INSERT command.

Well, it isn't that I'm doing it because I think that it is a great
idea, with everything to recommend it. It's more like I don't see any
practical alternative. We need the before row insert triggers to fire
to figure out what to insert (or if we should update instead). No way
around that. At the same time, those triggers are at liberty to do
almost anything, and so in general we have no way of totally
nullifying their effects (or side effects). Surely you see the
dilemma.

 As I understand it you are proposing that would be:

 INSERT INTO targettable(key, quantity, inserted_at)
   VALUES(123, quantity, now())
   ON CONFLICT WITHIN targettable_pkey
 UPDATE SET quantity = quantity + CONFLICTING(quantity), updated_at = 
 now();

That's right, yes.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me

Indeed, the current behavior breaks even the canonical keep track of
how many posts are in a thread trigger example because INSERT
triggers are fired for an effective row UPDATE. I can't see how that's
acceptable.

 Well, it isn't that I'm doing it because I think that it is a great
 idea, with everything to recommend it. It's more like I don't see any
 practical alternative.

I proposed an alternative that avoids this surprise and might allow
some other benefits. Can you please look into that? Even a clarify
this, this couldn't possibly work or you're not a major
contributor so your arguments are invalid response. ;)

I admit I initially misunderstood some details about the current
behavior. I can dig into the code and write a clearer and/or more
detailed spec if I had even *some* feedback.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 4:30 PM, Marti Raudsepp ma...@juffo.org wrote:
 On Thu, Oct 9, 2014 at 1:51 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 8, 2014 at 2:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 Although the last go-around does suggest that there is at least one
 point of difference on the semantics.  You seem to want to fire the
 BEFORE INSERT triggers before determining whether this will be an
 INSERT or an UPDATE.  That seems like a bad idea to me

 Indeed, the current behavior breaks even the canonical keep track of
 how many posts are in a thread trigger example because INSERT
 triggers are fired for an effective row UPDATE. I can't see how that's
 acceptable.

DB2 had the foresight to add the following restrictions to BEFORE ROW
triggers - they cannot:


Contain any INSERT, DELETE, or UPDATE operations, nor invoke any
routine defined with MODIFIES SQL DATA, if it is not a compound SQL.
Contain any DELETE or UPDATE operations on the trigger subject table,
nor invoke any routine containing such operations, if it is a compound
SQL.
Reference a materialized query table defined with REFRESH IMMEDIATE
Reference a generated column other than the identity column in the NEW
transition variable.


To get a sense of how doing fancy things in before triggers leads to
trouble, considering the hardening Kevin added in commit 6868ed74. In
short, use an AFTER trigger for this kind of thing, and all of these
issues go away.

 Well, it isn't that I'm doing it because I think that it is a great
 idea, with everything to recommend it. It's more like I don't see any
 practical alternative.

 I proposed an alternative that avoids this surprise and might allow
 some other benefits. Can you please look into that?

I looked at that. You're talking about making the case where before
row triggers clearly are appropriate (when you just want to change the
tuple being inserted) fail, in order to make an arguably inappropriate
case for before row triggers work (when you don't change the tuple to
be inserted, but you do want to do some other thing). That seems
backwards. My proposed behavior seems far less surprising.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 2:46 AM, Peter Geoghegan p...@heroku.com wrote:
 Indeed, the current behavior breaks even the canonical keep track of
 how many posts are in a thread trigger example
 use an AFTER trigger for this kind of thing, and all of these
 issues go away.

In the latest patches from CommitFest, the UPDATE case invokes
BEFOREAFTER INSERT triggers, not AFTER UPDATE. Is that going to be
changed? If so, I concede that.

 I proposed an alternative that avoids this surprise and might allow
 some other benefits. Can you please look into that?

 I looked at that.

Thanks!

 You're talking about making the case where before
 row triggers clearly are appropriate (when you just want to change the
 tuple being inserted) fail

Only in case the trigger changes *key* columns necessary for atomicity
(i.e. from the WITHIN index). Other columns are fair game. The
restriction seems justifiable to me: it's unreasonable to be atomic
with respect to values that change mid-way.

* It can distinguish between BEFORE INSERT and BEFORE UPDATE triggers,
which your approach fundamentally cannot.
* It can probably avoid evaluating column defaults in the UPDATE case,
like nextval() for unrelated serial columns = reduced overhead
* The semantics match better with MERGE-like syntax that seems to come
up again and again.
* And it appears to solve (part?) of the problem you had with naming
key *colums* in the WITHIN clause, rather than using index name.

If you don't see any reasons why it can't be done, these benefits seem
clear to me. I think the tradeoffs at least warrant wider discussion.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 6:12 PM, Marti Raudsepp ma...@juffo.org wrote:
 Oh, one more consideration: I believe you will run into the same issue
 if you want to implement BEFORE UPDATE triggers in any form. Skipping
 BEFORE UPDATE entirely seems to violate POLA.

Good thing that the patch doesn't do that, then. I clearly documented
this in a few places, including:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html

 It's common for applications to e.g. use triggers to keep track of
 latest modified time for a row. With your proposal, every query needs
 to include logic for that to work.

Wrong.

 If you don't see any reasons why it can't be done, these benefits seem
 clear to me. I think the tradeoffs at least warrant wider discussion.

 I don't.  That's very surprising. One day, it will fail unexpectedly.
 As proposed, the way BEFORE INSERT triggers fire almost forces users
 to consider the issues up-front.

 Not necessarily up-front, as proposed it causes existing triggers to
 change behavior when users adopt the upsert feature. And that adoption
 may even be transparent to the user due to ORM magic.

 There are potential surprises with both approaches.

When you make the slightest effort to understand what my approach is,
I might take your remarks seriously.

 Note that the CONFLICTING() behavior with respect to BEFORE INSERT
 triggers work's the same as MySQL's INSERT ON DUPLICATE KEY UPDATE
 foo = VALUES(foo) thing. There was agreement that that was the right
 behavior, it seemed.

 MySQL gets away with lots of things, they have several other caveats
 with triggers. I don't think it's a good example to follow wrt trigger
 behavior.

No true Scotsman.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Marti Raudsepp
On Thu, Oct 9, 2014 at 4:56 AM, Marti Raudsepp ma...@juffo.org wrote:
 create trigger ev1 before insert on evt_type execute procedure ins();
 create trigger ev2 before update on evt_type execute procedure upd();
 create trigger ev3 after insert on evt_type execute procedure ins();
 create trigger ev4 after update on evt_type execute procedure upd();

Oops, I missed for each row here.
Sorry for wasting your time.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-08 Thread Peter Geoghegan
On Wed, Oct 8, 2014 at 7:04 PM, Marti Raudsepp ma...@juffo.org wrote:
 Oops, I missed for each row here.

Note that I have an open item for what to do about statement level
triggers here: https://wiki.postgresql.org/wiki/UPSERT

I'm not satisfied with the current behavior. It needs more thought.
But I think row level triggers are fine.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-06 Thread Robert Haas
On Fri, Oct 3, 2014 at 4:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 ... the SQL standard does not require that MERGE be atomic in the
 sense of atomically providing either an INSERT or UPDATE, ...

 My understanding is that the standard logically requires (without
 concern for implementation details) that the second specified table
 (via table name or subquery -- which could be a VALUES statement)
 is scanned, and for each row it attempts to match a row in the
 target table.  That will either match or not, according to the
 boolean expression in the ON clause.  You can have one WHEN MATCHED
 THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
 If both clauses are present, I believe that it is guaranteed that
 one or the other (but not both) will fire for each row in the
 second table.  The usual guarantees for each isolation level must
 not be violated (although an implementation is not required to
 generate anomalies which could not happen with serial execution).
 So as usual for a transaction involving multiple tables,
 serialization anomalies are possible if the transaction isolation
 level is REPEATABLE READ or less.  Is that what you're getting at,
 or something else?

I think the problem is that it's not possible to respect the usual
guarantees even at READ COMMITTED level when performing an INSERT OR
UPDATE operation (however spelled).  You may find that there's a tuple
with the same PK which is committed but not visible to the snapshot
you took at the beginning of the statement.  An INSERT would fail; an
UPDATE would see no rows and do nothing.  IOW, *neither* operation
succeeds according to its classic semantics; to succeed, the INSERT OR
UPDATE has to do something altogether novel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-06 Thread Peter Geoghegan
On Mon, Oct 6, 2014 at 8:35 AM, Robert Haas robertmh...@gmail.com wrote:
 I think the problem is that it's not possible to respect the usual
 guarantees even at READ COMMITTED level when performing an INSERT OR
 UPDATE operation (however spelled).

That's definitely the main problem. Also, there could be garden
variety race conditions.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Peter Geoghegan
On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 I have two questions I hope you can clarify.  I'm having trouble
 parsing what this statement means:

 ... the SQL standard does not require that MERGE be atomic in the
 sense of atomically providing either an INSERT or UPDATE, ...

 My understanding is that the standard logically requires (without
 concern for implementation details) that the second specified table
 (via table name or subquery -- which could be a VALUES statement)
 is scanned, and for each row it attempts to match a row in the
 target table.  That will either match or not, according to the
 boolean expression in the ON clause.  You can have one WHEN MATCHED
 THEN UPDATE clause and/or one WHEN NOT MATCHED THEN INSERT clause.
 If both clauses are present, I believe that it is guaranteed that
 one or the other (but not both) will fire for each row in the
 second table.  The usual guarantees for each isolation level must
 not be violated (although an implementation is not required to
 generate anomalies which could not happen with serial execution).

That seems like a statement that isn't getting to the heart of the
matter. Which is: simple UPSERT cases will get things like duplicate
key violations under concurrency with MERGE, in SQL Server, Oracle,
and DB2. I think serialization failures are inevitable and appropriate
when not using READ COMMITTED mode with MVCC, but these duplicate
violations are sort of like READ COMMITTED serialization failures in
disguise. Or they would be, if MERGE (as described by the standard, or
in practice) promised to care about atomicity in that sense. It
doesn't, so they're off the hook. I think we should care about
atomicity (in the sense of always getting an insert or update, never
an error at RC) for the simple reason that that's what people want.

It seems like your remarks here don't acknowledge the reality: That
there are slightly different ideas of each row that are in play
here.

 So as usual for a transaction involving multiple tables,
 serialization anomalies are possible if the transaction isolation
 level is REPEATABLE READ or less.  Is that what you're getting at,
 or something else?

My complaint is quite straightforward, really. I don't want to have to
tell users to do this: http://stackoverflow.com/a/2249

At the same time, I also don't want to have to live with the
consequences of implementing a MERGE that does not exhibit that
behavior. Which is to say, the consequences of doing something like
selectively using different types of snapshots (MVCC or dirty - the
two different ideas of each row that are in tension) based on the
exact clauses used. That seems like asking for trouble, TBH.

 -- Predicate within UPDATE auxiliary statement
 -- (row is still locked when the UPDATE predicate
 -- isn't satisfied):
 INSERT INTO upsert(key, val) VALUES(1, 'insert')
   -- CONFLICTING() carries forward effects of both INSERT and UPDATE BEFORE 
 row-level triggers
   ON CONFLICT UPDATE SET val = CONFLICTING(val)
   -- Predicate has interesting semantics visibility-wise:
   WHERE val != 'delete';

 Can you explain what the WHERE clause there does?

Sure. Having already locked the tuple on the basis on finding a
conflict TID, but before an UPDATE, the qual is evaluated using
EvalPlanQual rescanning. The predicate must be satisfied in order for
the UPDATE to actually proceed. If, in this instance, the locked tuple
happened to have a val of 'delete', then we would not UPDATE at all.
It would still be locked, though (possibly without being visible to
our MVCC snapshot, just like when we might do that when we do UPDATE).

There are some interesting implications visibility-wise here that must
be considered. These are discussed on the Wiki page:
https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29

Obviously higher isolation levels just throw an error here instead.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Kevin Grittner
Peter Geoghegan p...@heroku.com wrote:
 On Fri, Oct 3, 2014 at 1:16 PM, Kevin Grittner kgri...@ymail.com wrote:
 I'm having trouble parsing what this statement means:

 ... the SQL standard does not require that MERGE be atomic in
 the sense of atomically providing either an INSERT or UPDATE, ...

 ... always getting an insert or update, never an error ...

I've never seen atomic used to mean you never get an error
before.  Perhaps it would be clearer to replace atomic with
error-free or some such.  It certainly would be less confusing
for me.  Atomic in most cases is a property that can be enforced
by generating an error where it would otherwise be violated.

For example: http://en.wikipedia.org/wiki/Atomicity_%28database_systems%29

  Although implementations vary depending on factors such as 
  concurrency issues, the principle of atomicity — i.e. complete 
  success or complete failure — remain.

When you are saying atomic you mean something quite different.

 My complaint is quite straightforward, really. I don't want to
 have to tell users to do this: http://stackoverflow.com/a/2249

I think you are confusing syntax with semantics.  I grant that a
reasonable person could have concerns about using the MERGE syntax
to implement the semantics you want in the special case that an
appropriate unique index exists, but pretending that the semantics
can't be achieved with that syntax is just silly.

 At the same time, I also don't want to have to live with the
 consequences of implementing a MERGE that does not exhibit that
 behavior. Which is to say, the consequences of doing something
 like selectively using different types of snapshots (MVCC or
 dirty - the two different ideas of each row that are in
 tension) based on the exact clauses used. That seems like asking
 for trouble, TBH.

Now *that* is getting more to a real issue.  We routinely pick very
different plans based on the presence or absence of an index, and
we use special snapshots in the course of executing many DML
statements (if FK triggers are fired), but this would be the first
place I can think of that the primary DML statement behaves that
way.  You and a couple others have expressed concern about that,
but it seems pretty vague and hand-wavey.  If a different execution
node is used for the special behavior, and that node is generated
based on the unique index being used, I'm really having problems
seeing where the problem lies.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Peter Geoghegan
On Fri, Oct 3, 2014 at 2:44 PM, Kevin Grittner kgri...@ymail.com wrote:
 I've never seen atomic used to mean you never get an error
 before.

 When you are saying atomic you mean something quite different.

Perhaps I should have been more careful on that point. Just to be
crystal clear: I don't intend that you'll never get an error (some
cases clearly *should* error). But I want confidence that reasonable,
simple UPSERTs will never error due to a duplicate violation that they
thought the statement was supposed to avoid. That's the point of
UPSERT. Guarantees matter. I would like to make guarantees that are at
least as good as the upsert looping subxact pattern.

Teradata refers to an atomic UPSERT. It also has a MERGE command.

 My complaint is quite straightforward, really. I don't want to
 have to tell users to do this: http://stackoverflow.com/a/2249

 I think you are confusing syntax with semantics.  I grant that a
 reasonable person could have concerns about using the MERGE syntax
 to implement the semantics you want in the special case that an
 appropriate unique index exists, but pretending that the semantics
 can't be achieved with that syntax is just silly.

I am not pretending that it can't be done - just, as I said, that to
do so would have bad consequences when time came to generalize the
syntax to support more advanced cases.

 At the same time, I also don't want to have to live with the
 consequences of implementing a MERGE that does not exhibit that
 behavior. Which is to say, the consequences of doing something
 like selectively using different types of snapshots (MVCC or
 dirty - the two different ideas of each row that are in
 tension) based on the exact clauses used. That seems like asking
 for trouble, TBH.

 Now *that* is getting more to a real issue.

Yes, it is. I am opposed to using the MERGE syntax for this *because*
MERGE is useful (independently useful, when done properly), not
because it is not useful.

 We routinely pick very
 different plans based on the presence or absence of an index, and
 we use special snapshots in the course of executing many DML
 statements (if FK triggers are fired), but this would be the first
 place I can think of that the primary DML statement behaves that
 way.  You and a couple others have expressed concern about that,
 but it seems pretty vague and hand-wavey.  If a different execution
 node is used for the special behavior, and that node is generated
 based on the unique index being used, I'm really having problems
 seeing where the problem lies.

We need to avoid duplicate violations. The authoritative source of
truth here is a unique index, because the presence or absence of a
conclusively-visible conflict tuple controls whether our insert fails
or succeeds respectively. We need a dirty snapshot to see tuples not
in our MVCC snapshot, since they need to be handled too. Otherwise,
when we fail to handle them, the MERGE's INSERT has a dup violation
(which is not what I want, for practical reasons).

If we're seq scanning a table, even with a dirty snapshot, there is no
authoritative source of truth. Heap tuples will be inserted
concurrently, and we'll have no idea what to do about it without
reference to a unique index as a choke point/single source of final
truth about what is and isn't going to be duplicated.

As implemented, my UPSERT patch will have UPSERTs not really use their
MVCC snapshot for simple cases, just like INSERTs in general. UPDATEs
always use their MVCC snapshot, as the ModifyTable node pulls up
tuples from an ordinary relation scan. UPSERT needs a DirtySnapshot
scan of the unique index to find duplicates, because that's always how
we find duplicates. At the same time, MERGE is not at liberty to not
work just because of the absence of a unique index (plus even row
locking must be prepared for conflicts). And since the MERGE is driven
by an outer join - *not* an INSERT-like search for duplicates - we'll
find ourselves providing one behavior when detecting one case, and
another when detecting the other, with subtle and confusing
differences between each case. If we fake an outer join by doing an
INSERT-like search for duplicates in a unique index (to make sure we
see duplicates), then we get into trouble elsewhere. Like, we cannot
let users not ask for an INSERT in their MERGE, because we'd be
conclusively deciding to *not* (say) UPDATE on the basis of the
absence of a value in a unique index, as opposed to the presence.

Suddenly, we're in the business of proving a negativeor are we?.
My head hurts.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Peter Geoghegan
On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan p...@heroku.com wrote:
 Yes, it is. I am opposed to using the MERGE syntax for this *because*
 MERGE is useful (independently useful, when done properly), not
 because it is not useful.

As I've mentioned, there is also the practical argument:  No one else
does this. That suggests to me that it's hard.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-03 Thread Peter Geoghegan
On Fri, Oct 3, 2014 at 3:42 PM, Peter Geoghegan p...@heroku.com wrote:
 We routinely pick very
 different plans based on the presence or absence of an index, and
 we use special snapshots in the course of executing many DML
 statements (if FK triggers are fired)

Apart from FK snapshots, we also use dirty snapshots for unique index
enforcement, obviously. That doesn't mean that it's the command/xact
snapshot, though. It's just a special constraint enforcement
mechanism.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Heikki Linnakangas

On 10/02/2014 02:52 AM, Peter Geoghegan wrote:

Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to value locking, I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT


Thanks!

Could you write down all of the discussed syntaxes, using a similar 
notation we use in the manual, with examples on how to use them? And 
some examples on what is possible with some syntaxes, and not with 
others? That would make it a lot easier to compare them.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 12:07 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Could you write down all of the discussed syntaxes, using a similar notation
 we use in the manual, with examples on how to use them? And some examples on
 what is possible with some syntaxes, and not with others? That would make it
 a lot easier to compare them.

I've started off by adding varied examples of the use of the existing
proposed syntax. I'll expand on this soon.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-02 Thread Peter Geoghegan
On Thu, Oct 2, 2014 at 12:54 AM, Peter Geoghegan p...@heroku.com wrote:
 I've started off by adding varied examples of the use of the existing
 proposed syntax. I'll expand on this soon.

I spent some time today expanding on the details, and commenting on
the issues around the custom syntax (exactly what it does, issues
around ambiguous conflict-on unique indexes, etc).

Also, Challenges, issues has been updated - I've added some
secondary concerns. These are non-contentious items that I think are
of concern, and would like to highlight now.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-01 Thread Peter Geoghegan
Having been surprisingly successful at advancing our understanding of
arguments for and against various approaches to value locking, I
decided to try the same thing out elsewhere. I have created a
general-purpose UPSERT wiki page.

The page is: https://wiki.postgresql.org/wiki/UPSERT

Right now, I would like to address the less contentious but still
unresolved question of whether or not we should use the SQL-standard
MERGE syntax instead of the non-standard INSERT ... ON CONFLICT UPDATE
syntax that my patch proposes.

Note that I'm trying to direct the conversation along certain lines:
Is MERGE the right syntax for this particular effort (UPSERT)? And,
in particular, not: Is SQL MERGE useful? I think that it is useful,
but is a largely unrelated feature.

Please correct the Wiki page if I have failed to credit SQL MERGE with
some advantage that someone stated at some point. I don't recall any
arguments for it, other than that it is in the SQL standard, but maybe
I missed one.

In general, add to this page, and edit it as you see fit. It'll be
useful to centralize the references, discussion and state of the patch
in one agreed upon place, as the patch continues to evolve.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT wiki page, and SQL MERGE syntax

2014-10-01 Thread Craig Ringer
On 10/02/2014 07:52 AM, Peter Geoghegan wrote:
 Having been surprisingly successful at advancing our understanding of
 arguments for and against various approaches to value locking, I
 decided to try the same thing out elsewhere. I have created a
 general-purpose UPSERT wiki page.
 
 The page is: https://wiki.postgresql.org/wiki/UPSERT

Thanks. That'll help keep things moving forward rather than around in
circles.

 In general, add to this page, and edit it as you see fit. It'll be
 useful to centralize the references, discussion and state of the patch
 in one agreed upon place, as the patch continues to evolve.

I added a summary of the status quo of upsert in Pg as it stands, and a
brief discussion of the state in other RDBMSes.

I'd love it if someone who knows MySQL better could add info on MySQL's
ON DUPLICATE KEY feature - advantages/problems, etc.


I've added a few points to the goals section:

- Any new upsert approach must be a usability improvement on the status
quo; we don't want to introduce subtle behaviour or unnecessary foot-guns.

- if possible, upsert of multiple values is desirable. We currently have
to loop on a per-value basis.


... and some miscellaneous edits/formatting changes.

I've also added sections for the other options:

* A custom syntax
https://wiki.postgresql.org/wiki/UPSERT#Custom_syntax

and

* Adopting an existing non-standard syntax
https://wiki.postgresql.org/wiki/UPSERT#Adopting_an_existing_non-standard_syntax

which I'd appreciate it if you could fill out based on your existing
research and notes. I don't think it makes sense for me to write those
when you've already done the required study/note taking and just need to
transfer it over.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPSERT

2007-03-04 Thread Hannu Krosing
Ühel kenal päeval, R, 2007-03-02 kell 10:13, kirjutas Tom Lane:
 Andrew Dunstan [EMAIL PROTECTED] writes:
  My instinct would be to follow your first strategy, i.e. detect which 
  path is needed rather than try one and then if it fails do the other.
 
 The very first thing you need to think about is how to solve the race
 condition problem, ie, two backends concurrently trying to insert
 identical data.  

Then one of them will update the data inserted by whoeved got the insert
first.

 Until you have a plausible mechanism for that, the
 whole thing is pie-in-the-sky.

Is'nt the standard way of doing it thus:

UPDATE
IF NOT FOUND THEN 
  INSERT
  IF DUPLICATE KEY THEN
UPDATE
  END IF
END IF

At least this is how UPSERT is usually done in plpgsql


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] UPSERT

2007-03-04 Thread Bruno Wolff III
On Sun, Mar 04, 2007 at 14:55:47 +0200,
  Hannu Krosing [EMAIL PROTECTED] wrote:
 
 UPDATE
 IF NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END IF

I believe it is possible for the above to fail. For example another
transaction could create a matching record between the update and insert
and then another transaction could delete it between the insert and the
second update.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-04 Thread Hannu Krosing
Ühel kenal päeval, P, 2007-03-04 kell 07:46, kirjutas Bruno Wolff III:
 On Sun, Mar 04, 2007 at 14:55:47 +0200,
   Hannu Krosing [EMAIL PROTECTED] wrote:
  
  UPDATE
  IF NOT FOUND THEN 
INSERT
IF DUPLICATE KEY THEN
  UPDATE
END IF
  END IF
 
 I believe it is possible for the above to fail. For example another
 transaction could create a matching record between the update and insert
 and then another transaction could delete it between the insert and the
 second update.

Then we may do the second part as a loop and hope that eventually we hit
the right point with either INSERT or UPDATE:

 UPDATE
 WHILE NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END WHILE

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] UPSERT

2007-03-04 Thread Petr Jelinek

Bruno Wolff III wrote:

On Sun, Mar 04, 2007 at 14:55:47 +0200,
  Hannu Krosing [EMAIL PROTECTED] wrote:

UPDATE
IF NOT FOUND THEN
  INSERT
  IF DUPLICATE KEY THEN
UPDATE
  END IF
END IF


I believe it is possible for the above to fail. For example another
transaction could create a matching record between the update and insert
and then another transaction could delete it between the insert and the
second update.


You know we have example in manual right ?
http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-UPSERT-EXAMPLE

:)

--
Regards
Petr Jelinek (PJMODOS)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-04 Thread Martijn van Oosterhout
On Sun, Mar 04, 2007 at 02:55:47PM +0200, Hannu Krosing wrote:
 Is'nt the standard way of doing it thus:
 
 UPDATE
 IF NOT FOUND THEN 
   INSERT
   IF DUPLICATE KEY THEN
 UPDATE
   END IF
 END IF
 
 At least this is how UPSERT is usually done in plpgsql

Well, you need to loop, because that last UPDATE can get a not-found
again, so you have to keep trying both until they work.

I think MERGE would still be cool, because then it's only one command
that has to be repeated, rather than two.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


[HACKERS] UPSERT

2007-03-02 Thread Jonathan Scher

Hello,

I'd like to work on TODO item:
 Add REPLACE or UPSERT command that does UPDATE, or on failure, INSERT

could you please tell me if I'm going in the right way?

There are some different syntaxes possible, but MySQL has an interesting 
one here:

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
This allow to make an insert, and if the key is already there to modify 
the value depending on the current one.


Then I have two choices possible:
- Search for existing tuples among key or unique constraint, then if 
nothing is found, insert it.
- Try to insert a new row, catch if there is any error, and then search 
for all tuple matching.


As it would be a new command, I have no idea on what the data could be.
Does syntax meet your needs? Which choice should I implement?

Regards
Jonathan Scher

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] UPSERT

2007-03-02 Thread Andrew Dunstan

Jonathan Scher wrote:

Hello,

I'd like to work on TODO item:
 Add REPLACE or UPSERT command that does UPDATE, or on failure, INSERT

could you please tell me if I'm going in the right way?

There are some different syntaxes possible, but MySQL has an 
interesting one here:

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
This allow to make an insert, and if the key is already there to 
modify the value depending on the current one.


Then I have two choices possible:
- Search for existing tuples among key or unique constraint, then if 
nothing is found, insert it.
- Try to insert a new row, catch if there is any error, and then 
search for all tuple matching.


As it would be a new command, I have no idea on what the data could be.
Does syntax meet your needs? Which choice should I implement?


Good. Some thoughts from the top of the head:

Is insert or on failure update semantically equivalent to update or 
on failure insert? If not the former seems more desirable to me anyway.


What are the syntax alternatives? This one from MySQL doesn't seem too 
bad, but it would be good to have them all on the table.


My instinct would be to follow your first strategy, i.e. detect which 
path is needed rather than try one and then if it fails do the other.


cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] UPSERT

2007-03-02 Thread Florian G. Pflug

Andrew Dunstan wrote:

Jonathan Scher wrote:

Hello,

I'd like to work on TODO item:
 Add REPLACE or UPSERT command that does UPDATE, or on failure, INSERT

could you please tell me if I'm going in the right way?

There are some different syntaxes possible, but MySQL has an 
interesting one here:

http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html

INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
This allow to make an insert, and if the key is already there to 
modify the value depending on the current one.


May this could be generalized to a generic stmt on error do stmt?
You could then write
update table set c=c+1 on not_found do insert into table (a,b,c) values 
(1,2,3)

Just an idea I just had...

greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] UPSERT

2007-03-02 Thread Gregory Stark
Florian G. Pflug [EMAIL PROTECTED] writes:

 INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
 This allow to make an insert, and if the key is already there to modify the
 value depending on the current one.

 May this could be generalized to a generic stmt on error do stmt?
 You could then write
 update table set c=c+1 on not_found do insert into table (a,b,c) values 
 (1,2,3)

 Just an idea I just had...

We have such a thing, subtransactions.

The reason UPSERT or ON DUPLICATE is interesting is because it provides a way
to do it atomically. That is, you keep the locks acquired from the duplicate
key check and if it fails you update the same records you just found violating
the duplicate key.

If the user tries to do the same thing he has to repeat the search after the
duplicate key check has released the locks so it's possible they've been
deleted or updated since. So the user has to loop in case the update fails to
find any records and he has to start over trying to insert. The same problem
plagues you if you do it the other way around too.

The tricky part is avoiding race conditions. The way the unique index code
avoids having someone else come along and insert at the same time is by
holding a lock on an index page. I'm not sure if you can keep that lock while
you go lock the tuples for the update.



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] UPSERT

2007-03-02 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 My instinct would be to follow your first strategy, i.e. detect which 
 path is needed rather than try one and then if it fails do the other.

The very first thing you need to think about is how to solve the race
condition problem, ie, two backends concurrently trying to insert
identical data.  Until you have a plausible mechanism for that, the
whole thing is pie-in-the-sky.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] UPSERT

2007-03-02 Thread Heikki Linnakangas

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
My instinct would be to follow your first strategy, i.e. detect which 
path is needed rather than try one and then if it fails do the other.


The very first thing you need to think about is how to solve the race
condition problem, ie, two backends concurrently trying to insert
identical data.  Until you have a plausible mechanism for that, the
whole thing is pie-in-the-sky.


How about:

1. Insert new heap tuple
2. Try to insert the index tuple. If there's a duplicate tuple, lock the 
existing tuple instead of throwing an error.

3. If there was no duplicate, we're done.

4. Otherwise, kill the new tuple inserted in step 1, by setting it's 
xmin to InvalidTransactionId.

5. Perform the UPDATE on the existing tuple.

This requires one change to the indexam api: a duplicate key violation 
needs to lock the existing tuple instead of throwing an error.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] UPSERT

2007-03-02 Thread Florian G. Pflug

Gregory Stark wrote:

Florian G. Pflug [EMAIL PROTECTED] writes:


INSERT INTO table (a,b,c) VALUES (1,2,3) ON DUPLICATE KEY UPDATE c=c+1;
This allow to make an insert, and if the key is already there to modify the
value depending on the current one.

May this could be generalized to a generic stmt on error do stmt?
You could then write
update table set c=c+1 on not_found do insert into table (a,b,c) values 
(1,2,3)

Just an idea I just had...


We have such a thing, subtransactions.


Yeah, I know - but the syntax above would provide a way to write that inline
instead of doing it at the application (or plpgsql) level.


The reason UPSERT or ON DUPLICATE is interesting is because it provides a way
to do it atomically. That is, you keep the locks acquired from the duplicate
key check and if it fails you update the same records you just found violating
the duplicate key.

If the user tries to do the same thing he has to repeat the search after the
duplicate key check has released the locks so it's possible they've been
deleted or updated since. So the user has to loop in case the update fails to
find any records and he has to start over trying to insert. The same problem
plagues you if you do it the other way around too.

I agree - my generic syntax seems to be too generic, and doesn't take
locking into account.. :-(


The tricky part is avoiding race conditions. The way the unique index code
avoids having someone else come along and insert at the same time is by
holding a lock on an index page. I'm not sure if you can keep that lock while
you go lock the tuples for the update.


Maybe doing the following would work:
start:
do_index_lookup
if (found_row) {
  lock_row
  if (acquired_lock) {
do_update
return
  }
  //Row was deleted
}
create_row_on_heap
create_index_entry
if (success)
  return
else {
  mark_row_as_deleted //or remove row?
  goto start
}

It seems like this would work without creating a subtransaction, but
I'm not really sure..

greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] UPSERT

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 15:41 +, Heikki Linnakangas wrote:
 Tom Lane wrote:
  Andrew Dunstan [EMAIL PROTECTED] writes:
  My instinct would be to follow your first strategy, i.e. detect which 
  path is needed rather than try one and then if it fails do the other.
  
  The very first thing you need to think about is how to solve the race
  condition problem, ie, two backends concurrently trying to insert
  identical data.  Until you have a plausible mechanism for that, the
  whole thing is pie-in-the-sky.
 
 How about:
 
 1. Insert new heap tuple
 2. Try to insert the index tuple. If there's a duplicate tuple, lock the 
 existing tuple instead of throwing an error.
 3. If there was no duplicate, we're done.
 
 4. Otherwise, kill the new tuple inserted in step 1, by setting it's 
 xmin to InvalidTransactionId.
 5. Perform the UPDATE on the existing tuple.
 
 This requires one change to the indexam api: a duplicate key violation 
 needs to lock the existing tuple instead of throwing an error.

So if the INSERT fails we will leave two dead copies of the tuples? Hmm.

Seems like we should try to locate a row first, then INSERT if we cannot
find one. That's slower on INSERT but more balanced overall - sometimes
the input will generate all UPDATEs, sometimes all INSERTs we'll never
know.


I'm a bit surprised the TODO didn't mention the MERGE statement, which
is the SQL:2003 syntax for specifying this as an atomic statement. There
are lots of other syntaxes, the most simple of which are the MySQL
REPLACE and Teradata's UPDATE ... ELSE INSERT. As seductive as they are,
I'd say that's all the more reason to go with the single approved
syntax. If MySQL are standards compliant, they will support that also,
so we get MySQL compatibility either way.

Another thought that really ought to be on the TODO is a MERGE FROM
(pick your syntax) that allows MERGE to act like a COPY, reading data
from an external data file. That would save effort, since the only way
of doing this currently is to do a COPY then an UPDATE and then an
INSERT. So the MERGE FROM would reduce three operations to just a single
command. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] UPSERT

2007-03-02 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Seems like we should try to locate a row first, then INSERT if we cannot
 find one. That's slower on INSERT but more balanced overall

Except it still has the race condition.

 I'm a bit surprised the TODO didn't mention the MERGE statement, which
 is the SQL:2003 syntax for specifying this as an atomic statement.

I believe we concluded that MERGE doesn't actually do quite what people
want/expect.  Please go back and read the archives.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] UPSERT

2007-03-02 Thread Bricklen Anderson

Simon Riggs wrote:

I'm a bit surprised the TODO didn't mention the MERGE statement, which
is the SQL:2003 syntax for specifying this as an atomic statement.


http://archives.postgresql.org/pgsql-hackers/2004-05/thrd5.php#00497

There is a thread there entitled Adding MERGE to the TODO list

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] UPSERT

2007-03-02 Thread Tom Lane
Bricklen Anderson [EMAIL PROTECTED] writes:
 http://archives.postgresql.org/pgsql-hackers/2004-05/thrd5.php#00497
 There is a thread there entitled Adding MERGE to the TODO list

The more interesting discussion is the one that got it taken off TODO again,
from Nov 2005.  Try these threads:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00501.php
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00536.php

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] UPSERT

2007-03-02 Thread Simon Riggs
On Fri, 2007-03-02 at 13:19 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Seems like we should try to locate a row first, then INSERT if we cannot
  find one. That's slower on INSERT but more balanced overall
 
 Except it still has the race condition.

I'm not saying it didn't; but dropping in two dead copies of a tuple
isn't acceptable either.

  I'm a bit surprised the TODO didn't mention the MERGE statement, which
  is the SQL:2003 syntax for specifying this as an atomic statement.
 
 I believe we concluded that MERGE doesn't actually do quite what people
 want/expect.  Please go back and read the archives.

Yes, it was my thread. I recall that there wasn't any acceptable answer
to how it could be done with reasonable efficiency.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] UPSERT

2007-03-02 Thread Bricklen Anderson

Tom Lane wrote:

Bricklen Anderson [EMAIL PROTECTED] writes:

http://archives.postgresql.org/pgsql-hackers/2004-05/thrd5.php#00497
There is a thread there entitled Adding MERGE to the TODO list


The more interesting discussion is the one that got it taken off TODO again,
from Nov 2005.  Try these threads:
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00501.php
http://archives.postgresql.org/pgsql-hackers/2005-11/msg00536.php

regards, tom lane


Yeah, that's a better set of threads.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] UPSERT

2007-03-02 Thread Josh Berkus
Couple notes:

(1) Upsert is not just a desire of MySQL users.  I just spec'd a major 
proprietary-database replacement project at a fortune 500 where they want an 
Upsert and are unhappy that PostgreSQL doesn't have it.  Unfortunately, they 
don't want to spring for development funds :-(

(2) Doing upsert by checking for a unique key violaton error leads to horrible 
performance in addition to the previously mentioned race conditions.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster