Re: [HACKERS] Issues with logical replication

Robert Haas Wed, 15 Nov 2017 12:09:48 -0800

On Mon, Oct 9, 2017 at 9:19 PM, Stas Kelvich <[email protected]> wrote:
>   I investigated this case and it seems that XactLockTableWait() in 
> SnapBuildWaitSnapshot()
> not always work as expected. XactLockTableWait() waits on LockAcquire() for 
> xid to be
> completed and if we finally got this lock but transaction is still in 
> progress then such xid
> assumed to be a subxid. However LockAcquire/LockRelease cycle can happen 
> after transaction
> set xid, but before XactLockTableInsert().
>
> Namely following history happened for xid 5225 and lead to crash:
>
> [backend] LOG:  AssignTransactionId: XactTopTransactionId = 5225
>    [walsender] LOG:  LogCurrentRunningXacts: Wrote RUNNING_XACTS xctn=1, 
> xid[0]=5225
>    [walsender] LOG:  XactLockTableWait: LockAcquire 5225
>    [walsender] LOG:  XactLockTableWait: LockRelease 5225
> [backend] LOG:  AssignTransactionId: LockAcquire ExclusiveLock 5225
>    [walsender] LOG:  TransactionIdIsInProgress: SVC->latestCompletedXid=5224 
> < xid=5225 => true
> [backend] LOG:  CommitTransaction: ProcArrayEndTransaction xid=5225, ipw=0
> [backend] LOG:  CommitTransaction: ResourceOwnerRelease locks xid=5225


Ouch.  This seems like a bug that needs to be fixed, but do you think
it's related to to Petr's proposed fix to set es_output_cid?  That fix
looks reasonable, since we shouldn't try to lock tuples without a
valid CommandId.

Now, having said that, I understand how the lack of that fix could cause:

2017-10-02 18:40:26.101 MSK [2954] ERROR:  attempted to lock invisible tuple

But I do not understand how it could cause:

#3  0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0,
oper=XLTW_None) at lmgr.c:582

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Issues with logical replication

Reply via email to