Andres Freund escribió: > Ok, this is helpful. Do you rather longrunning transactions? The > transaction that does foreign key checks has an xid of 10260613, while > the row that's getting checked has 13514992.
Thanks for the analysis. > #4 0x0000000000635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501 > tag = {locktag_field1 = 13514992, locktag_field2 = 0, locktag_field3 > = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 > '\001'} > #5 0x0000000000482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, > tuple=<value optimized out>, ctid=<value optimized out>, xid=10260613, > mode=LockTupleKeyShare) at heapam.c:4847 > > I am not sure whether that's the origin of the problem but at the very > least it seems to me that heap_lock_updated_tuple_rec() is missing > several important pieces: > a) do the priorXmax==xmin dance to check we're still following the same > ctid chain. Currently we could easily stumble across completely > unrelated tuples if a chain element aborted and got vacuumed. This seems simple to handle by adding the check you propose to the loop. Basically if the xmax doesn't match the xmin, we reached the end, there's nothing more to lock and we can return success without any further work: /* * Check the tuple XMIN against prior XMAX, if any. If we reached * the end of the chain, we're done, so return success. */ if (TransactionIdIsValid(priorXmax) && !TransactionIdEquals(HeapTupleHeaderGetXmin(&mytup), priorXmax)) { UnlockReleaseBuffer(buf); return HeapTupleMayBeUpdated; } > b) Check whether a chain element actually aborted - currently we're > only doing that in the HEAP_KEYS_UPDATED updated case, but that seems > wrong (we can't check for committed tho!). Let me point out that this is exactly the same code that would be affected by my proposed fix for #8434, which would have this check the updateXid in all cases, not only in KEYS_UPDATED as currently. I don't understand your comment about "can't check for committed". I mean, if the updating transaction committed, then we need to return failure to caller, HeapTupleUpdated. This signals to the caller that the future version of the tuple is locked and the whole thing needs to be failed. (In READ COMMITTED isolation level, this would cause EvalPlanQual to fetch the updated version of the tuple and redo the whole thing from there. In REPEATABLE READ and above, the whole operation would be aborted.) In short I propose to fix part this with the simple patch I proposed for bug #8434. > c) (reported separately as well) cope with failure of heap_fetch() to > return anything. Proposed patch for this was posted in the other thread. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers