On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich <s.kelv...@postgrespro.ru> wrote: > I investigated this case and it seems that XactLockTableWait() in > SnapBuildWaitSnapshot() > not always work as expected. XactLockTableWait() waits on LockAcquire() for > xid to be > completed and if we finally got this lock but transaction is still in > progress then such xid > assumed to be a subxid. However LockAcquire/LockRelease cycle can happen > after transaction > set xid, but before XactLockTableInsert(). > > Namely following history happened for xid 5225 and lead to crash: > > [backend] LOG: AssignTransactionId: XactTopTransactionId = 5225 > [walsender] LOG: LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, > xid[0]=5225 > [walsender] LOG: XactLockTableWait: LockAcquire 5225 > [walsender] LOG: XactLockTableWait: LockRelease 5225 > [backend] LOG: AssignTransactionId: LockAcquire ExclusiveLock 5225 > [walsender] LOG: TransactionIdIsInProgress: SVC->latestCompletedXid=5224 > < xid=5225 => true > [backend] LOG: CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0 > [backend] LOG: CommitTransaction: ResourceOwnerRelease locks xid=5225
Ouch. This seems like a bug that needs to be fixed, but do you think it's related to to Petr's proposed fix to set es_output_cid? That fix looks reasonable, since we shouldn't try to lock tuples without a valid CommandId. Now, having said that, I understand how the lack of that fix could cause: 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible tuple But I do not understand how it could cause: #3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, oper=XLTW_None) at lmgr.c:582 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company