The attached patch implements INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. This is similar to INSERT...ON DUPLICATE KEY IGNORE (which is also proposed at part of this new revision of the patch), but additionally acquires a row exclusive lock on the row that prevents insertion from proceeding in respect of some tuple proposed for insertion.
This feature offers something that I believe could be reasonably described as upsert. Consider: postgres=# create table foo(a int4 primary key, b text); CREATE TABLE postgres=# with r as ( insert into foo(a,b) values (5, '!'), (6, '@') on duplicate key lock for update returning rejects * ) update foo set b = r.b from r where foo.a = r.a; UPDATE 0 Here there are 0 rows affected by the update, because all work was done in the insert. If l do it again 2 rows are affected by the update: postgres=# with r as ( insert into foo(a,b) values (5, '!'), (6, '@') on duplicate key lock for update returning rejects * ) update foo set b = r.b from r where foo.a = r.a; UPDATE 2 Obviously, rejects were now projected into the wCTE, and the underlying rows were locked. The idea is that we can update the rows, confident that each rejection-causing row will be updated in a race-free fashion. I personally prefer this to something like MySQL's INSERT...ON DUPLICATE KEY UPDATE, because it's more flexible. For example, we could have deleted the locked rows instead, if that happened to make sense. Making this kind of usage idiomatic feels to me like the Postgres way to do upsert. Others may differ here. I will however concede that it'll be unfortunate to not have some MySQL compatibility, for the benefit of people porting widely used web frameworks. I'm not really sure if I should have done something brighter here than lock the first duplicate found, or if it's okay that that's all I do. That's another discussion entirely. Though previously Andres and I did cover the question of prioritizing unique indexes, so that the most sensible duplicate for the particular situation was returned, according to some criteria. As previously covered, I felt that including a row locking component was essential to reasoning about our requirements for what I've termed "speculative insertion" -- the basic implementation of value locking that is needed to make all this work. As I said in that earlier thread, there are many opinions about this, and it isn't obvious which one is right. Any approach needs to have its interactions with row locking considered right up front. Those that consider this a new patch with new functionality, or even a premature expansion on what I've already posted should carefully consider that. Do we really want to assume that these two things are orthogonal? I think that they're probably not, but even if that happens to turn out to have been not the case, it's an unnecessary risk to take. Row locking ========== Row locking is implemented with calls to a new function above ExecInsert. We don't bother with the usual EvalPlanQual looping pattern for now, preferring to just re-check from scratch if there is a concurrent update from another session (see comments in ExecLockHeapTupleForUpdateSpec() for details). We might do better here. I haven't considered the row locking functionality in too much detail since the last revision, preferring to focus on value locking. Buffer locking/value locking ====================== Andres raised concerns about the previous patch's use of exclusive buffer locks for extended periods (i.e. during a single heap tuple insertion). These locks served as extended value locks. With this revision, we don't hold exclusive buffer locks for the duration of heap insertion - we hold shared buffer locks instead. I believe that Andres principal concern was the impact on concurrent index scans by readers, so I think that all of this will go some way towards alleviating his concerns generally. This necessitated inventing entirely new LWLock semantics around "weakening" (from exclusive to shared) and "strengthening" (from shared to exclusive) of locks already held. Of course, as you'd expect, there are some tricky race hazards surrounding these new functions that clients need to be mindful of. These have been documented within lwlock.c. I looked for a precedent for these semantics, and found a few. Perhaps the most prominent was Boost, a highly regarded, peer-reviews C++ library. Boost implements exactly these semantics for some of its thread synchronization/mutex primitives: http://www.boost.org/doc/libs/1_54_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_concepts.upgrade_lockable They have a concept of upgradable ownership, which is just like shared ownership, except, I gather, that the owner reserves the exclusive right to upgrade to an exclusive lock (for them it's not quite an exclusive lock; it's an upgradeable/downgradable exclusive lock). My solution is to push that responsibility onto the client - I admonish something along the lines of "don't let more than one shared locker do this at a time per LWLock". I am of course mindful of this caveat in my modifications to the btree code, where I "weaken" and then later "strengthen" an exclusive lock - the trick here is that before I weaken I get a regular exclusive lock, and I only actually weaken after that when going ahead with insertion. I suspect that this may not be the only place where this trick is helpful. This intended usage is described in the relevant comments added to lwlock.c. Testing ====== This time around, in order to build confidence in the new LWLock infrastructure for buffer locking, on debug builds we re-verify that the value proposed for insertion on the locked page is in fact not on that page as expected during the second phase, and that our previous insertion point calculation is still considered correct. This is kind of like the way we re-verifying the wait-queue-is-in-lsn-order invariant in syncrep.c on debug builds. It's really a fancier assertion - it doesn't just test the state of scalar variables. This was invaluable during development of the new LWLock infrastructure. Just as before, but this time with just shared buffer locks held during heap tuple insertion, the patch has resisted considerable brute-force efforts to break it (e.g. using pgbench to get many sessions speculatively inserting values into a table. Many different INSERT... ON DUPLICATE KEY LOCK FOR UPDATE statements, interspersed with UPDATE, DELETE and SELECT statements. Seeing if spurious duplicate tuple insertions occur, or deadlocks, or assertion failures). As always, isolation tests are included. Bugs ==== I fixed the bug that Andres reported in relation to multiple exclusive indexes' interaction with waits for another transaction's end during speculative insertion. I did not get around to fixing the broken ecpg regression tests, as reported by Peter Eisentraut. I was a little puzzled by the problem there. I'll return to it in a while, or perhaps someone else can propose a solution. Thoughts? -- Peter Geoghegan
insert_on_dup.v2.2013_09_08.patch.gz
Description: GNU Zip compressed data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers