I spent a lot of time looking at this issue.  Your problem scenario can
be reduced to the following test case:

CREATE TABLE t (a INT);
INSERT INTO t VALUES (1);

The three sessions need to execute the following commands, in this
sequence:

-- Session 1
/* 1 */ BEGIN;
SELECT * FROM t FOR UPDATE;

-- Session 2
/* 2 */ BEGIN;
UPDATE t SET a = 2;             -- blocks waiting on S1

-- Session 3
/* 3 */ BEGIN;
UPDATE t SET a = 3;             -- blocks
-- this is waiting on S1, but since S2 is also waiting on S1 and will acquire a
-- conflicting lock first, this session will only be released after S2 releases
-- its own lock.

Session 1
/* 4 */ UPDATE t SET a = 4;
COMMIT;                         -- releases session 2
BEGIN;
SELECT * FROM t FOR UPDATE;     -- blocks waiting on S2 (*)

Session 2:
/* 5 */ COMMIT;                         -- releases session 1
BEGIN;
UPDATE t SET a = 5;                     -- blocks waiting on S3

Session 1:
/* 6 */ UPDATE t SET a = 6;             -- blocks waiting on S3

At this point, all sessions are blocked on their UPDATE commands, and
deadlock is reported.  (For debugging, it's useful to set a very large
deadlock_timeout and examine pg_locks, etc.)

The point at which things started to go wrong was where the session 1's
SELECT FOR UPDATE with the (*) was allowed to continue after session 2
commits.  In 9.2 that session remains blocked, and instead we release
session 3.

The problem is the handling of following of the update chain during a
lock command; the question is should sessions be blocked waiting for
locks on future versions of the row?  They currently don't block, which
is what leads the SELECT FOR UPDATE to continue; but if we make them
block, other things break.  I initially toyed with the following patch,
which removes the deadlock the above report and also your original test
case:

--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -4832,15 +4832,12 @@ l4:
                xmax = HeapTupleHeaderGetRawXmax(mytup.t_data);
 
                /*
-                * If this tuple is updated and the key has been modified (or
-                * deleted), what we do depends on the status of the updating
-                * transaction: if it's live, we sleep until it finishes; if it 
has
-                * committed, we have to fail (i.e. return HeapTupleUpdated); 
if it
-                * aborted, we ignore it. For updates that didn't touch the 
key, we
-                * can just plough ahead.
+                * If this tuple is updated, what we do depends on the status 
of the
+                * updating transaction: if it's live, we sleep until it 
finishes; if
+                * it has committed, we have to fail (i.e. return 
HeapTupleUpdated); if
+                * it aborted, we ignore it.
                 */
-               if (!(old_infomask & HEAP_XMAX_INVALID) &&
-                       (mytup.t_data->t_infomask2 & HEAP_KEYS_UPDATED))
+               if (!(old_infomask & HEAP_XMAX_INVALID))
                {
                        TransactionId update_xid;
 

However, the differences in behavior this causes (visible by running the
isolation tests) don't look good to me.  It's quite possible that there
are other bugs elsewhere.

Sadly, I have to look at other things now and I won't have time to
research this further until after next week's minor releases; so
whatever we decide to do, if anything, will not be in 9.3.1.

Thanks for the report.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to