On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas <robertmh...@gmail.com> wrote: > Concurrent readers will block in a non-interruptible wait if they try > to access a buffer, and that's a situation that will be intolerable > if, for example, it can persist across a disk I/O. And I don't see > any way to avoid that.
Then I have some bad news for you - that's already quite possible. _bt_insertonpg() is called with the very same buffer exclusive locked, and is where we do btree page splits. The first thing that _bt_split does is this: /* Acquire a new page to split into */ rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE); (Obviously this may ultimately result in the storage manager extending the index relation). Plus the split is WAL-logged immediately afterwards, which could result in us blocking on someone else's I/O under contention (granted, the XLogInsert scaling patch has now considerably ameliorated that general problem). All the while holding an exclusive lock on the same buffer. Note also that _bt_insertonpg() is called again recursively after a page split. And of course we WAL-log btree index tuple insertion proper all the time. Let me be clear about something, though: I am not at all dismissive of Andres' concerns. I was concerned about many of the same things before I posted the patch. I think that Andres and I ought to re-frame this discussion a little bit. Right now, the presumption we seem to be making, perhaps without even realizing it, is this is about providing functionality equivalent to MySQL's INSERT IGNORE; insert tuples proposed for insertion where possible, otherwise do not. However, Andres and I, not to mention almost every Postgres user, are actually much more interested in something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE. That's what this mechanism has to support, even if it is *technically* possible to commit just INSERT...ON DUPLICATE KEY IGNORE and punt on these other questions. It was silly of me not to do that up-front. So I'm going to try and produce a patch that does this as well for my next revision. Maybe this will enable Andres to refute my position that the buffer locking approach to speculative insertion/value locking may actually be acceptable. If that gets us closer to having the feature committed in some form, then I welcome it. I fully expect to be held to this new standard - it would be insane to do anything less. I don't want to throw out an old IGNORE value locking mechanism and invent a whole new one for upserting a little bit down the line. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers