Re: Deadlock between backend and recovery may not be detected

Fujii Masao Tue, 22 Dec 2020 06:58:56 -0800


On 2020/12/22 20:42, Fujii Masao wrote:



On 2020/12/22 10:25, Masahiko Sawada wrote:

On Fri, Dec 18, 2020 at 6:36 PM Fujii Masao <[email protected]> wrote:




On 2020/12/17 2:15, Fujii Masao wrote:



On 2020/12/16 23:28, Drouvot, Bertrand wrote:

Hi,

On 12/16/20 2:36 PM, Victor Yegorov wrote:


*CAUTION*: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


ср, 16 дек. 2020 г. в 13:49, Fujii Masao <[email protected] 
<mailto:[email protected]>>:

     After doing this procedure, you can see the startup process and backend
     wait for the table lock each other, i.e., deadlock. But this deadlock 
remains
     even after deadlock_timeout passes.

     This seems a bug to me.

+1


     > * Deadlocks involving the Startup process and an ordinary backend process
     > * will be detected by the deadlock detector within the ordinary backend.

     The cause of this issue seems that ResolveRecoveryConflictWithLock() that
     the startup process calls when recovery conflict on lock happens doesn't
     take care of deadlock case at all. You can see this fact by reading the 
above
     source code comment for ResolveRecoveryConflictWithLock().

     To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
     timer in ResolveRecoveryConflictWithLock() so that the startup process can
     send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
     Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
     the backend should check whether the deadlock actually happens or not.
     Attached is the POC patch implimenting this.

good catch!

I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be 
set in ResolveRecoveryConflictWithLock() too (it is already set in 
ResolveRecoveryConflictWithBufferPin()).

So + 1 to consider this as a bug and for the way the patch proposes to fix it.


Thanks Victor and Bertrand for agreeing!
Attached is the updated version of the patch.


Attached is v3 of the patch. Could you review this version?

While the startup process is waiting for recovery conflict on buffer pin,
it repeats sending the request for deadlock check to all the backends
every deadlock_timeout. This may increase the workload in the startup
process and backends, but since this is the original behavior, the patch
doesn't change that. Also in practice this may not be so harmful because
the period that the buffer is kept pinned is basically not so long.


@@ -529,6 +603,26 @@ ResolveRecoveryConflictWithBufferPin(void)
      */
     ProcWaitForSignal(PG_WAIT_BUFFER_PIN);

+   if (got_standby_deadlock_timeout)
+   {
+       /*
+        * Send out a request for hot-standby backends to check themselves for
+        * deadlocks.
+        *
+        * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
+        * to be signaled by UnpinBuffer() again and send a request for
+        * deadlocks check if deadlock_timeout happens. This causes the
+        * request to continue to be sent every deadlock_timeout until the
+        * buffer is unpinned or ltime is reached. This would increase the
+        * workload in the startup process and backends. In practice it may
+        * not be so harmful because the period that the buffer is kept pinned
+        * is basically no so long. But we should fix this?
+        */
+       SendRecoveryConflictWithBufferPin(
+
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+       got_standby_deadlock_timeout = false;
+   }
+

Since SendRecoveryConflictWithBufferPin() sends the signal to all
backends every backend who is waiting on a lock at ProcSleep() and not
holding a buffer pin blocking the startup process will end up doing a
deadlock check, which seems expensive. What is worse is that the
deadlock will not be detected because the deadlock involving a buffer
pin isn't detected by CheckDeadLock(). I thought we can replace
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK with
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN but it’s not good because the
backend who has a buffer pin blocking the startup process and not
waiting on a lock is also canceled after deadlock_timeout. We can have
the backend return from RecoveryConflictInterrupt() when it received
PROCSIG_RECOVERY_CONFLICT_BUFFERPIN and is not waiting on any lock,
but it’s also not good because we cannot cancel the backend after
max_standby_streaming_delay that has a buffer pin blocking the startup
process. So I wonder if we can have a new signal. When the backend
received this signal it returns from RecoveryConflictInterrupt()
without deadlock check either if it’s not waiting on any lock or if it
doesn’t have a buffer pin blocking the startup process. Otherwise it's
cancelled.


Thanks for pointing out that issue! Using new signal is an idea. Another idea
is to make a backend skip check the deadlock if GetStartupBufferPinWaitBufId()
returns -1, i.e., the startup process is not waiting for buffer pin. So,
what I'm thinkins is;

In RecoveryConflictInterrupt(), when a backend receive
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,

1. If a backend isn't waiting for a lock, it does nothing .
2. If a backend is waiting for a lock and also holding a buffer pin that
    delays recovery, it may be canceled.
3. If a backend is waiting for a lock and the startup process is not waiting
    for buffer pin (i.e., the startup process is also waiting for a lock),
    it checks for the deadlocks.
4. If a backend is waiting for a lock and isn't holding a buffer pin that
    delays recovery though the startup process is waiting for buffer pin,
    it does nothing.


I implemented this. Patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index ee912b9d5e..7c1a3f8b3f 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -3324,6 +3324,13 @@ GetConflictingVirtualXIDs(TransactionId limitXmin, Oid 
dbOid)
  */
 pid_t
 CancelVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode)
+{
+       return SignalVirtualTransaction(vxid, sigmode, true);
+}
+
+pid_t
+SignalVirtualTransaction(VirtualTransactionId vxid, ProcSignalReason sigmode,
+                                                bool conflictPending)
 {
        ProcArrayStruct *arrayP = procArray;
        int                     index;
@@ -3342,7 +3349,7 @@ CancelVirtualTransaction(VirtualTransactionId vxid, 
ProcSignalReason sigmode)
                if (procvxid.backendId == vxid.backendId &&
                        procvxid.localTransactionId == vxid.localTransactionId)
                {
-                       proc->recoveryConflictPending = true;
+                       proc->recoveryConflictPending = conflictPending;
                        pid = proc->pid;
                        if (pid != 0)
                        {
diff --git a/src/backend/storage/ipc/standby.c 
b/src/backend/storage/ipc/standby.c
index 92d9027776..3164aed423 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -42,6 +42,10 @@ int                  max_standby_streaming_delay = 30 * 1000;
 
 static HTAB *RecoveryLockLists;
 
+/* Flags set by timeout handlers */
+static volatile sig_atomic_t got_standby_deadlock_timeout = false;
+static volatile sig_atomic_t got_standby_lock_timeout = false;
+
 static void ResolveRecoveryConflictWithVirtualXIDs(VirtualTransactionId 
*waitlist,
                                                                                
                   ProcSignalReason reason,
                                                                                
                   uint32 wait_event_info,
@@ -395,8 +399,10 @@ ResolveRecoveryConflictWithDatabase(Oid dbid)
  * lock.  As we are already queued to be granted the lock, no new lock
  * requests conflicting with ours will be granted in the meantime.
  *
- * Deadlocks involving the Startup process and an ordinary backend process
- * will be detected by the deadlock detector within the ordinary backend.
+ * We also must check for deadlocks involving the Startup process and
+ * hot-standby backend processes. If deadlock_timeout is reached in
+ * this function, all the backends holding the conflicting locks are
+ * requested to check themselves for deadlocks.
  */
 void
 ResolveRecoveryConflictWithLock(LOCKTAG locktag)
@@ -407,7 +413,7 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
 
        ltime = GetStandbyLimitTime();
 
-       if (GetCurrentTimestamp() >= ltime)
+       if (GetCurrentTimestamp() >= ltime && ltime != 0)
        {
                /*
                 * We're already behind, so clear a path as quickly as possible.
@@ -429,19 +435,76 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
        else
        {
                /*
-                * Wait (or wait again) until ltime
+                * Wait (or wait again) until ltime, and check for deadlocks as 
well
+                * if we will be waiting longer than deadlock_timeout
                 */
-               EnableTimeoutParams timeouts[1];
+               EnableTimeoutParams timeouts[2];
+               int                     cnt = 0;
 
-               timeouts[0].id = STANDBY_LOCK_TIMEOUT;
-               timeouts[0].type = TMPARAM_AT;
-               timeouts[0].fin_time = ltime;
-               enable_timeouts(timeouts, 1);
+               if (ltime != 0)
+               {
+                       got_standby_lock_timeout = false;
+                       timeouts[cnt].id = STANDBY_LOCK_TIMEOUT;
+                       timeouts[cnt].type = TMPARAM_AT;
+                       timeouts[cnt].fin_time = ltime;
+                       cnt++;
+               }
+
+               got_standby_deadlock_timeout = false;
+               timeouts[cnt].id = STANDBY_DEADLOCK_TIMEOUT;
+               timeouts[cnt].type = TMPARAM_AFTER;
+               timeouts[cnt].delay_ms = DeadlockTimeout;
+               cnt++;
+
+               enable_timeouts(timeouts, cnt);
        }
 
        /* Wait to be signaled by the release of the Relation Lock */
        ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
 
+       /*
+        * Exit if ltime is reached. Then all the backends holding conflicting
+        * locks will be canceled in the next ResolveRecoveryConflictWithLock()
+        * call.
+        */
+       if (got_standby_lock_timeout)
+               goto cleanup;
+
+       if (got_standby_deadlock_timeout)
+       {
+               VirtualTransactionId *backends;
+
+               backends = GetLockConflicts(&locktag, AccessExclusiveLock, 
NULL);
+
+               /* Quick exit if there's no work to be done */
+               if (!VirtualTransactionIdIsValid(*backends))
+                       goto cleanup;
+
+               /*
+                * Send signals to all the backends holding the conflicting 
locks, to
+                * ask them to check themselves for deadlocks.
+                */
+               while (VirtualTransactionIdIsValid(*backends))
+               {
+                       SignalVirtualTransaction(*backends,
+                                                                        
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK,
+                                                                        false);
+                       backends++;
+               }
+
+               /*
+                * Wait again here to be signaled by the release of the 
Relation Lock,
+                * to prevent the subsequent RecoveryConflictWithLock() from 
causing
+                * deadlock_timeout and sending a request for deadlocks check 
again.
+                * Otherwise the request continues to be sent every 
deadlock_timeout
+                * until the relation locks are released or ltime is reached.
+                */
+               got_standby_deadlock_timeout = false;
+               ProcWaitForSignal(PG_WAIT_LOCK | locktag.locktag_type);
+       }
+
+cleanup:
+
        /*
         * Clear any timeout requests established above.  We assume here that 
the
         * Startup process doesn't have any other outstanding timeouts than 
those
@@ -449,6 +512,8 @@ ResolveRecoveryConflictWithLock(LOCKTAG locktag)
         * timeouts individually, but that'd be slower.
         */
        disable_all_timeouts(false);
+       got_standby_lock_timeout = false;
+       got_standby_deadlock_timeout = false;
 }
 
 /*
@@ -513,9 +578,12 @@ ResolveRecoveryConflictWithBufferPin(void)
                timeouts[0].id = STANDBY_TIMEOUT;
                timeouts[0].type = TMPARAM_AT;
                timeouts[0].fin_time = ltime;
+
+               got_standby_deadlock_timeout = false;
                timeouts[1].id = STANDBY_DEADLOCK_TIMEOUT;
                timeouts[1].type = TMPARAM_AFTER;
                timeouts[1].delay_ms = DeadlockTimeout;
+
                enable_timeouts(timeouts, 2);
        }
 
@@ -529,6 +597,26 @@ ResolveRecoveryConflictWithBufferPin(void)
         */
        ProcWaitForSignal(PG_WAIT_BUFFER_PIN);
 
+       if (got_standby_deadlock_timeout)
+       {
+               /*
+                * Send out a request for hot-standby backends to check 
themselves for
+                * deadlocks.
+                *
+                * XXX The subsequent ResolveRecoveryConflictWithBufferPin() 
will wait
+                * to be signaled by UnpinBuffer() again and send a request for
+                * deadlocks check if deadlock_timeout happens. This causes the
+                * request to continue to be sent every deadlock_timeout until 
the
+                * buffer is unpinned or ltime is reached. This would increase 
the
+                * workload in the startup process and backends. In practice it 
may
+                * not be so harmful because the period that the buffer is kept 
pinned
+                * is basically no so long. But we should fix this?
+                */
+               SendRecoveryConflictWithBufferPin(
+                                                                               
  PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+               got_standby_deadlock_timeout = false;
+       }
+
        /*
         * Clear any timeout requests established above.  We assume here that 
the
         * Startup process doesn't have any other timeouts than what this 
function
@@ -595,13 +683,12 @@ CheckRecoveryConflictDeadlock(void)
 
 /*
  * StandbyDeadLockHandler() will be called if STANDBY_DEADLOCK_TIMEOUT
- * occurs before STANDBY_TIMEOUT.  Send out a request for hot-standby
- * backends to check themselves for deadlocks.
+ * occurs before STANDBY_TIMEOUT.
  */
 void
 StandbyDeadLockHandler(void)
 {
-       
SendRecoveryConflictWithBufferPin(PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK);
+       got_standby_deadlock_timeout = true;
 }
 
 /*
@@ -620,11 +707,11 @@ StandbyTimeoutHandler(void)
 
 /*
  * StandbyLockTimeoutHandler() will be called if STANDBY_LOCK_TIMEOUT is 
exceeded.
- * This doesn't need to do anything, simply waking up is enough.
  */
 void
 StandbyLockTimeoutHandler(void)
 {
+       got_standby_lock_timeout = true;
 }
 
 /*
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 7dc3911590..45a99ac0a4 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1793,6 +1793,9 @@ CheckDeadLockAlert(void)
         * Have to set the latch again, even if handle_sig_alarm already did. 
Back
         * then got_deadlock_timeout wasn't yet set... It's unlikely that this
         * ever would be a problem, but setting a set latch again is cheap.
+        *
+        * Note that, when this function runs inside 
procsignal_sigusr1_handler(),
+        * the handler function sets the latch again after the latch is set 
here.
         */
        SetLatch(MyLatch);
        errno = save_errno;
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3679799e50..708cd8b9e5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2919,11 +2919,23 @@ RecoveryConflictInterrupt(ProcSignalReason reason)
                        case PROCSIG_RECOVERY_CONFLICT_BUFFERPIN:
 
                                /*
-                                * If we aren't blocking the Startup process 
there is nothing
-                                * more to do.
+                                * If PROCSIG_RECOVERY_CONFLICT_BUFFERPIN is 
requested but we
+                                * aren't blocking the Startup process there is 
nothing more
+                                * to do.
+                                *
+                                * When 
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK is
+                                * requested, if we're waiting for locks and 
the startup
+                                * process is not waiting for buffer pin (i.e., 
also waiting
+                                * for locks), we set the flag so that 
ProcSleep() will check
+                                * for deadlocks.
                                 */
                                if (!HoldingBufferPinThatDelaysRecovery())
+                               {
+                                       if (reason == 
PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK &&
+                                               GetStartupBufferPinWaitBufId() 
< 0)
+                                               CheckDeadLockAlert();
                                        return;
+                               }
 
                                MyProc->recoveryConflictPending = true;
 
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index ea8a876ca4..4d272c7287 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -72,6 +72,8 @@ extern VirtualTransactionId 
*GetCurrentVirtualXIDs(TransactionId limitXmin,
                                                                                
                   int *nvxids);
 extern VirtualTransactionId *GetConflictingVirtualXIDs(TransactionId 
limitXmin, Oid dbOid);
 extern pid_t CancelVirtualTransaction(VirtualTransactionId vxid, 
ProcSignalReason sigmode);
+extern pid_t SignalVirtualTransaction(VirtualTransactionId vxid, 
ProcSignalReason sigmode,
+                                                                         bool 
conflictPending);
 
 extern bool MinimumActiveBackends(int min);
 extern int     CountDBBackends(Oid databaseid);

Re: Deadlock between backend and recovery may not be detected

Reply via email to