Re: [HACKERS] MERGE SQL Statement for PG11

Peter Geoghegan Mon, 06 Nov 2017 13:51:06 -0800

Simon Riggs <[email protected]> wrote:

In step 3 we discover that an entry exists in the index for a committed row.


Since we have a unique index we use it to locate the row we know
exists and UPDATE that.

We don't use a new MVCC snapshot, we do what EPQ does. EPQ is already
violating MVCC for UPDATEs, so why does it matter if we do it for
INSERTs also?


Before I go on to say why I think that this approach is problematic, I
want to point out a few things that I think we actually agree on:

* EPQ is fairly arbitrary as a behavior for READ COMMITTED UPDATE
 conflict handling. It has more to do with how VACUUM works than about
 some platonic ideal that everyone agrees on.

* We can imagine other alternatives, such as the behavior in Oracle
 (statement level rollback + optimistic retry).

* Those alternatives are probably better in some ways but worse in other
 ways.

* EPQ violates snapshot consistency, even though that's not inherently
 necessary to avoid "READ COMMITTED serialization errors".

* ON CONFLICT also violates snapshot consistency, in rather a different
 way. (Whether or not this is necessary is more debatable.)

I actually think that other MVCC systems don't actually copy Oracle here,
either, and for similar pragmatic reasons. It's a mixed bag.

Where hides the problem?


The problem is violating MVCC is something that can be done in different
ways, and by meaningful degrees:

* EPQ semantics are believed to be fine because we don't get complaints
 about it. I think that that's because it's specialized to UPDATEs and
 UPDATE-like operations, where we walk an UPDATE chain specifically,
 and only use a dirty snapshot for the chain's newer tuples.

* ON CONFLICT doesn't care about UPDATE chains. Unlike EPQ, it makes no
 distinction between a concurrent UPDATE, and a concurrent DELETE + fresh
 INSERT. It's specialized to CONFLICTs.

This might seem abstract, but it has real, practical implications.
Certain contradictions exist when you start with MVCC semantics, then
fall back to EPQ semantics, then finally fall back to ON CONFLICT
semantics.

Questions about mixing these two things:

* What do we do if someone concurrently UPDATEs in a way that makes the
 qual not pass during EPQ traversal? Should we INSERT when that
 happens?

* If so, what about the case when the MERGE join qual/unique index
 values didn't change (just some other attributes that do not pass the
 additional WHEN MATCHED qual)?

* What about when there was a concurrent DELETE -- should we INSERT then?

ON CONFLICT goes from a CONFLICT, and then applies its own qual. That's
hugely different to doing it the other way around: starting from your
own MVCC snapshot qual, and going to a CONFLICT. This is because
evaluating the DO UPDATE's WHERE clause is just one little extra step
after the one and only latest row for that value has been locked.  You
could theoretically go this way with 2PL, I think, because that's a bit
like locking every row that the predicate touches, but of course that
isn't at all practical.

I should stop trying to make a watertight case against this, even though
I still think that's possible. For now, instead, I'll just say that this
is *extremely* complicated, and still has unresolved questions about
semantics.

--
Peter Geoghegan


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MERGE SQL Statement for PG11

Reply via email to