Stas Kelvich wrote:

> Seems that having busy loop is the best idea out of several discussed.
> 
> I thought about small sleep at the bottom of that loop if we reached topmost
> transaction, but taking into account low probability of that event may be
> it is faster to do just busy wait.

In other locations where we do similar things we have 1ms sleeps.  I
agree with the need for a comment here.

Proposed patch attached.  I tried your reload.pgb test case in 9.4
(after changing pgoutput to test_decoding and removing the 3rd arg to a
function call) and the crash takes about 3 seconds without patch in my
machine.  No crash with this patch.

Will push this shortly after lunch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From abca9e98d1ddf86e892bd23bd24878516225a2d1 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvhe...@alvh.no-ip.org>
Date: Wed, 3 Jan 2018 12:36:47 -0300
Subject: [PATCH] Make XactLockTableWait work for transactions that are not yet
 self-locked
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

XactLockTableWait assumed that its xid argument has already added itself
to the lock table.  That assumption led to another assumption that if
locking the xid has succeeded but the xid is reported as still in
progress, then the input xid must have been a subtransaction.

These assumptions hold true for the original uses of this code in
locking related to on-disk tuples, but they break down in logical
replication slot snapshot building -- in particular, when a standby
snapshot logged contains an xid that's already in ProcArray but not yet
in the lock table.  This leads to assertion failures that can be
reproduced all the way back to 9.4, when logical decoding was
introduced.

The new coding does not assume that the xid is a subtransaction anymore,
and will retry the lock with same toplevel xid it was given before
(courtesy of SubTransGetTopmostTransaction()).

For consistency, change ConditionalXactLockTableWait the same way.

Author: Petr Jelínek
Discussion: 
https://postgr.es/m/1b3e32d8-fcf4-40b4-aef9-5c0e3ac57...@postgrespro.ru
Reported-by: Konstantin Knizhnik
Diagnosed-by: Stas Kelvich, Petr Jelínek
Reviewed-by: Andres Freund, Robert Haas
---
 src/backend/storage/lmgr/lmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index 3754283d2b..c184f1f956 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -590,7 +590,7 @@ XactLockTableWait(TransactionId xid, Relation rel, 
ItemPointer ctid,
 
                if (!TransactionIdIsInProgress(xid))
                        break;
-               xid = SubTransGetParent(xid);
+               xid = SubTransGetTopmostTransaction(xid);
        }
 
        if (oper != XLTW_None)
@@ -622,7 +622,7 @@ ConditionalXactLockTableWait(TransactionId xid)
 
                if (!TransactionIdIsInProgress(xid))
                        break;
-               xid = SubTransGetParent(xid);
+               xid = SubTransGetTopmostTransaction(xid);
        }
 
        return true;
-- 
2.11.0

Reply via email to