Re: [HACKERS] SQL MERGE is quite distinct from UPSERT

2014-07-21 Thread Peter Geoghegan
On Sun, Jul 20, 2014 at 10:41 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 I'd be very keen to see atomic upsert in Pg. Please Cc me on any patches
 / discussion, I'll be an eager tester.

Great. Thanks Craig.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE is quite distinct from UPSERT

2014-07-20 Thread Martijn van Oosterhout
On Sat, Jul 19, 2014 at 09:55:19PM -0700, Peter Geoghegan wrote:
 At a high level SQL MERGE is quite distinct from UPSERT, in that it is
 a utility command that performs inserts, updates and deletes while
 avoiding race conditions (e.g. unique constraint violations) on a more
 or less best effort basis. MERGE is conceptually messy. In contrast
 UPSERT is actually atomic, and having its behavior be relatively easy
 to reason about ought to be the top priority. There is a *really* big
 demand for UPSERT from users, not MERGE, although MERGE is certainly
 useful too.

FWIW, I agree. MERGE is hard enough as it is, but trying to guarentee
some kind of atomicity makes it nigh on impossible.  Indeed, after
reading what you wrote I think it may well be impossible to make it
atomic *and* make it perform in the general case.

So, +1 UPSERT.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] SQL MERGE is quite distinct from UPSERT

2014-07-20 Thread Craig Ringer
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 07/21/2014 01:40 AM, Martijn van Oosterhout wrote:
 FWIW, I agree. MERGE is hard enough as it is, but trying to
 guarentee some kind of atomicity makes it nigh on impossible.
 Indeed, after reading what you wrote I think it may well be
 impossible to make it atomic *and* make it perform in the general
 case.
 
 So, +1 UPSERT.

I totally agree. Particularly because MERGE-like behaviour is already
possible with wCTEs and/or table locking. It's not beautiful, but
neither is MERGE.

- -- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
-BEGIN PGP SIGNATURE-
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTzKfbAAoJELBXNkqjr+S2GhIIALAMmpMuQiMqsJ/GHjCfeXYQ
Tb3dO0ocBgpk8CobGEVjDnLOh4Rfqt4XZ9pEGr38XEmmzfjc2nEczk+PFq+bRKki
d9lRk8BDH5fcyIYfCNXbycUBbJ/b+inLdhZI0wp3kGX6V1MWTuOquTp8NTbTzvcL
tJXRyWEqsMuXIA26B31W3AkLAFaFF7fpZiD91SI7ECozg1Qr+Ey5tTjJj1+ErzAC
5MnK4nSwwbFTdS7SaOmzzfGKT7BoSlbAXbF8gshbBA5IPU7FxfBcvAquxpPalF73
/949kneIWDA3Qux73wmr182ph4U8usgODA0Iq6QAHa4IPJWFfCvyRA9vt6P86oM=
=vVI+
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE is quite distinct from UPSERT

2014-07-20 Thread Craig Ringer
On 07/20/2014 12:55 PM, Peter Geoghegan wrote:

 There is a *really* big
 demand for UPSERT from users, not MERGE, although MERGE is certainly
 useful too.

The inability to efficiently say Add this unique-keyed row, or if a row
of the same key already exists replace it atomically is a fundamental
defect in SQL its self. Vendors shouldn't need to be coming up with
their own versions because the standard should really cover this - much
like LIMIT and OFFSET.

It's very high in the most frequently asked questions on Stack Overflow,
right up there with questions about pg_hba.conf, connection issues on OS
X, etc.

I'd be very keen to see atomic upsert in Pg. Please Cc me on any patches
/ discussion, I'll be an eager tester.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SQL MERGE is quite distinct from UPSERT

2014-07-19 Thread Peter Geoghegan
Someone asked me privately why we weren't immediately pursuing SQL
MERGE, as opposed to yet another non-standard variant of UPSERT
(UPSERT is loosely defined here as an insert-or-update DML command
that goes to update based on would-be unique constraint violations,
and does one or the other *atomically*). I believe at least one other
system (Teradata) that now has both formerly only had the latter. I
think it makes sense to have both. While I believe that it's already
generally accepted that SQL MERGE is distinct from UPSERT in fairly
fundamental ways, in that it solves a somewhat distinct set of
problems, I would like to put my thoughts on the matter on record.
This is mostly for the sake of the archives.

The first reason why the two are distinct is that MERGE, as
implemented within other major systems and as described by the SQL
standard does not require an atomic insert-or-update for even the
simplest possible case of OLTP style insert-or-update. As I went into
during my talk at pgCon, even these simple cases may throw errors on
other systems' MERGE implementations. This leeway to fail may be
valuable for some use cases though, like the bulk loading use case.
Simon Riggs has characterized MERGE as something that exists for the
benefit of those two fairly distinct use cases [1], and I agree with
him. I suspect that the predictability of MERGE's behavior suffers due
to trying to support both at once, and I think that MERGE should be
specifically promoted as a bulk loading command if we ever get around
to adding it to PostgreSQL.

The second reason they differ is that SQL MERGE as implemented within
other systems doesn't require that a unique index be defined on the
column or columns of the MERGE statement's outer join. I am not
enthusiastic about the idea of falling back to table-level locking
when an appropriate index isn't available. That could be a foot-gun
for the large majority of users that just want to solve their basic
insert-or-update problem. I did think to mention this when asked about
my work on UPSERT, but the next question was: So why not just throw
an error when it's not possible to use an available unique index?
This is a fair point, but doesn't address all the problems with trying
to make MERGE perform upserts that meet my requirements for UPSERT
[2](principally the requirement of atomicity in the sense of always
getting an insert or update).

These two points, particularly the first are likely to weigh upon the
implementation of both, which is why we cannot very well just insist
upon having appropriate unique indexes defined. It buys my UPSERT
implementation plenty to have the upsert driven by an insert into a
unique index. With UPSERT, having an insert occur is always an outcome
that the DML statement's author will consider acceptable. SQL MERGE
statements can be written without any WHEN NOT MATCHED THEN INSERT;
SQL MERGE statement authors are not necessarily going to expect an
insert at all.

As I outlined in my pgCon talk, there appears to be a fundamental
trade-off involved when implementing UPSERT. My implementation is
prepared for the possibility that upon finding a duplicate, and upon
subsequently attempting to lock a row, the row may be found to be
deleted. When that happens, we try again without having made any
useful progress in the first iteration. The ideal of a strict queue of
upserters, with totally fair row locking style lock arbitration
appears to be basically impossible, at least provided you're unwilling
to give up insert-or-update atomicity (that is, to get this you have
to be willing to give up with an error even at READ COMMITTED, which
is a cop out, or buy into using something like table level locks,
which is also a cop out). Theoretical lock starvation hazards exist in
my implementation because I don't do this, but all indications are
that this is fine in the real world. In this regard my implementation
is similar to the current looping upsert subtransaction pattern we've
been recommending for years [3]. We don't hold on to value locks on
values within indexes (except to finish staggered index-locks-first
inserting - but not to update/row lock).

This is all fine in the real world because *somebody* has to make
progress when there is highly contended upserting. Some session has to
insert-or-update successfully - otherwise you cannot have a conflict
to begin with. Livelocking is therefore not possible, and you need a
*sustained* perfect storm of conflicts starving out one session in
order to see lock starvation at all. SQL MERGE has an outer join on a
table of proposed values (as opposed to an inner join against already
rejected values). MERGE allows you to do something other than insert
WHEN NOT MATCHED (i.e. to do nothing), and I think that being able to
always insert at that juncture gives us a way to usefully terminate
the loop where there might not be another way.

Suppose you had a similar keep retrying implementation (similar to
what I've proposed for