Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics

2013-09-27 Thread Peter Geoghegan
On Tue, Sep 24, 2013 at 2:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 Various messages are discussing semantics around visibility. I by now
 have a hard time keeping track. So let's keep the discussion of the
 desired semantics to this thread.

Yes, it's pretty complicated.

I meant to comment on this here, but ended up saying some stuff to
Robert about this in the main thread, so I should probably direct you
to that. You were probably right to start a new thread, because I
think we can usefully discuss this topic in parallel, but that's just
what ended up happening.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics

2013-09-24 Thread Andres Freund
Hi,

Various messages are discussing semantics around visibility. I by now
have a hard time keeping track. So let's keep the discussion of the
desired semantics to this thread.

There have been some remarks about serialization failures in read
committed transactions. I agree, those shouldn't occur. But I don't
actually think they are so much of a problem if we follow the path set
by existing uses of the EPQ logic. The scenario described seems to be an
UPSERT conflicting with a row it cannot see in the original snapshot of
the query.
In that case I think we just have to follow the example laid by
ExecUpdate, ExecDelete and heap_lock_tuple. Use the EPQ machinery (or an
alternative approach with similar enough semantics) to get a new
snapshot and follow the ctid chain. When we've found the end of the
chain we try to update that tuple.
That surely isn't free of surprising semantics, but it would follows existing
semantics. Which everybody writing concurrent applications in read
committed should (but doesn't) know. Adding a different set of semantics
seems like a bad idea.
Robert seems to have been the primary sceptic around this, what scenario
are you actually concerned about?

There are some scenarios that doesn't trivially answer. But I'd like to
understand the primary concerns first.

Regards,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - visibility semantics

2013-09-24 Thread Robert Haas
On Tue, Sep 24, 2013 at 5:14 AM, Andres Freund and...@2ndquadrant.com wrote:
 Various messages are discussing semantics around visibility. I by now
 have a hard time keeping track. So let's keep the discussion of the
 desired semantics to this thread.

 There have been some remarks about serialization failures in read
 committed transactions. I agree, those shouldn't occur. But I don't
 actually think they are so much of a problem if we follow the path set
 by existing uses of the EPQ logic. The scenario described seems to be an
 UPSERT conflicting with a row it cannot see in the original snapshot of
 the query.
 In that case I think we just have to follow the example laid by
 ExecUpdate, ExecDelete and heap_lock_tuple. Use the EPQ machinery (or an
 alternative approach with similar enough semantics) to get a new
 snapshot and follow the ctid chain. When we've found the end of the
 chain we try to update that tuple.
 That surely isn't free of surprising semantics, but it would follows existing
 semantics. Which everybody writing concurrent applications in read
 committed should (but doesn't) know. Adding a different set of semantics
 seems like a bad idea.
 Robert seems to have been the primary sceptic around this, what scenario
 are you actually concerned about?

I'm not skeptical about offering it as an option; in fact, I just
suggested basically the same thing on the other thread, before reading
this.  Nonetheless it IS an MVCC violation; the chances that someone
will be able to demonstrate serialization anomalies that can't occur
today with this new facility seem very high to me.  I feel it's
perfectly fine to respond to that by saying: yep, we know that's
possible, if it's a concern in your environment then don't use this
feature.  But it should be clearly documented.

I do think that it will be easier to get this to work if we have a
define the operation as REPLACE, bundling all of the magic inside a
single SQL command.  If the user issues an INSERT first and then must
try an UPDATE afterwards if the INSERT doesn't actually insert, then
you're going to have problems if the UPDATE can't see the tuple with
which the INSERT conflicted, and you're going to need some kind of a
loop in case the UPDATE itself fails.  Even if we can work out all the
details, a single command that does insert-or-update seems like it
will be easier to use and more efficient.  You might also want to
insert multiple tuples using INSERT ... VALUES (...), (...), (...);
figuring out which ones were inserted and which ones must now be
updated seems like a chore better avoided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers