Re: Latches vs lwlock contention

Aleksander Alekseev Tue, 11 Jul 2023 08:12:05 -0700

Hi,

> Maybe this is all ok, but it would be good to make the assumptions more
> explicit.


Here are my two cents.

```
static void
SetLatchV(Latch **latches, int nlatches)
{
    /* Flush any other changes out to main memory just once. */
    pg_memory_barrier();

    /* Keep only latches that are not already set, and set them. */
    for (int i = 0; i < nlatches; ++i)
    {
        Latch       *latch = latches[i];

        if (!latch->is_set)
            latch->is_set = true;
        else
            latches[i] = NULL;
    }

    pg_memory_barrier();

[...]

void
SetLatches(LatchGroup *group)
{
    if (group->size > 0)
    {
        SetLatchV(group->latches, group->size);

[...]
```

I suspect this API may be error-prone without some additional
comments. The caller (which may be an extension author too) may rely
on the implementation details of SetLatches() / SetLatchV() and use
the returned group->latches[] values e.g. to figure out whether he
attempted to change the state of the given latch. Even worse, one can
mistakenly assume that the result says exactly if the caller was the
one who changed the state of the latch. This being said I see why this
particular implementation was chosen.

I added corresponding comments to SetLatchV() and SetLatches(). Also
the patchset needed a rebase. PFA v4.

It passes `make installcheck-world` on 3 machines of mine: MacOS x64,
Linux x64 and Linux RISC-V.


--
Best regards,
Aleksander Alekseev

From 006499ee28230e8c1b5c8d2f047276c5661f50ad Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 26 Oct 2022 15:51:45 +1300
Subject: [PATCH v4 2/6] Provide SetLatches() for batched deferred latches.

If we have a way to buffer a set of wakeup targets and process them at a
later time, we can:

* move SetLatch() system calls out from under LWLocks, so that locks can
  be released faster; this is especially interesting in cases where the
  target backends will immediately try to acquire the same lock, or
  generally when the lock is heavily contended

* possibly gain some micro-opimization from issuing only two memory
  barriers for the whole batch of latches, not two for each latch to be
  set

* prepare for future IPC mechanisms which might allow us to wake
  multiple backends in a single system call

Individual users of this facility will follow in separate patches.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c  | 221 ++++++++++++++++++++-----------
 src/include/storage/latch.h      |  13 ++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 161 insertions(+), 74 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index db889385b7..5b7113dac8 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -589,6 +589,88 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 	return ret;
 }
 
+/*
+ * Sets multiple 'latches' at the same time.
+ *
+ * Modifies the input array. The nature of this modification is
+ * an implementation detail of SetLatchV(). The caller shouldn't try
+ * interpreting it and/or using the input array after the call.
+ */
+static void
+SetLatchV(Latch **latches, int nlatches)
+{
+	/* Flush any other changes out to main memory just once. */
+	pg_memory_barrier();
+
+	/* Keep only latches that are not already set, and set them. */
+	for (int i = 0; i < nlatches; ++i)
+	{
+		Latch	   *latch = latches[i];
+
+		if (!latch->is_set)
+			latch->is_set = true;
+		else
+			latches[i] = NULL;
+	}
+
+	pg_memory_barrier();
+
+	/* Wake the remaining latches that might be sleeping. */
+	for (int i = 0; i < nlatches; ++i)
+	{
+		Latch	   *latch = latches[i];
+
+		/*
+		 * See if anyone's waiting for the latch. It can be the current
+		 * process if we're in a signal handler. We use the self-pipe or
+		 * SIGURG to ourselves to wake up WaitEventSetWaitBlock() without
+		 * races in that case. If it's another process, send a signal.
+		 *
+		 * Fetch owner_pid only once, in case the latch is concurrently
+		 * getting owned or disowned. XXX: This assumes that pid_t is atomic,
+		 * which isn't guaranteed to be true! In practice, the effective range
+		 * of pid_t fits in a 32 bit integer, and so should be atomic. In the
+		 * worst case, we might end up signaling the wrong process. Even then,
+		 * you're very unlucky if a process with that bogus pid exists and
+		 * belongs to Postgres; and PG database processes should handle excess
+		 * SIGURG interrupts without a problem anyhow.
+		 *
+		 * Another sort of race condition that's possible here is for a new
+		 * process to own the latch immediately after we look, so we don't
+		 * signal it. This is okay so long as all callers of
+		 * ResetLatch/WaitLatch follow the standard coding convention of
+		 * waiting at the bottom of their loops, not the top, so that they'll
+		 * correctly process latch-setting events that happen before they
+		 * enter the loop.
+		 */
+		if (latch && latch->maybe_sleeping)
+		{
+#ifndef WIN32
+			pid_t		owner_pid = latch->owner_pid;
+
+			if (owner_pid == MyProcPid)
+			{
+				if (waiting)
+				{
+#if defined(WAIT_USE_SELF_PIPE)
+					sendSelfPipeByte();
+#else
+					kill(MyProcPid, SIGURG);
+#endif
+				}
+			}
+			else
+				kill(owner_pid, SIGURG);
+#else
+			HANDLE		event = latch->event;
+
+			if (event)
+				SetEvent(event);
+#endif
+		}
+	}
+}
+
 /*
  * Sets a latch and wakes up anyone waiting on it.
  *
@@ -604,89 +686,80 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 void
 SetLatch(Latch *latch)
 {
-#ifndef WIN32
-	pid_t		owner_pid;
-#else
-	HANDLE		handle;
-#endif
-
-	/*
-	 * The memory barrier has to be placed here to ensure that any flag
-	 * variables possibly changed by this process have been flushed to main
-	 * memory, before we check/set is_set.
-	 */
-	pg_memory_barrier();
-
-	/* Quick exit if already set */
-	if (latch->is_set)
-		return;
-
-	latch->is_set = true;
-
-	pg_memory_barrier();
-	if (!latch->maybe_sleeping)
-		return;
+	SetLatchV(&latch, 1);
+}
 
-#ifndef WIN32
+/*
+ * Add a latch to a group, to be set later.
+ */
+void
+AddLatch(LatchGroup *group, Latch *latch)
+{
+	/* First time.  Set up the in-place buffer. */
+	if (!group->latches)
+	{
+		group->latches = group->in_place_buffer;
+		group->capacity = lengthof(group->in_place_buffer);
+		Assert(group->size == 0);
+	}
 
-	/*
-	 * See if anyone's waiting for the latch. It can be the current process if
-	 * we're in a signal handler. We use the self-pipe or SIGURG to ourselves
-	 * to wake up WaitEventSetWaitBlock() without races in that case. If it's
-	 * another process, send a signal.
-	 *
-	 * Fetch owner_pid only once, in case the latch is concurrently getting
-	 * owned or disowned. XXX: This assumes that pid_t is atomic, which isn't
-	 * guaranteed to be true! In practice, the effective range of pid_t fits
-	 * in a 32 bit integer, and so should be atomic. In the worst case, we
-	 * might end up signaling the wrong process. Even then, you're very
-	 * unlucky if a process with that bogus pid exists and belongs to
-	 * Postgres; and PG database processes should handle excess SIGUSR1
-	 * interrupts without a problem anyhow.
-	 *
-	 * Another sort of race condition that's possible here is for a new
-	 * process to own the latch immediately after we look, so we don't signal
-	 * it. This is okay so long as all callers of ResetLatch/WaitLatch follow
-	 * the standard coding convention of waiting at the bottom of their loops,
-	 * not the top, so that they'll correctly process latch-setting events
-	 * that happen before they enter the loop.
-	 */
-	owner_pid = latch->owner_pid;
-	if (owner_pid == 0)
-		return;
-	else if (owner_pid == MyProcPid)
+	/* Are we full? */
+	if (group->size == group->capacity)
 	{
-#if defined(WAIT_USE_SELF_PIPE)
-		if (waiting)
-			sendSelfPipeByte();
-#else
-		if (waiting)
-			kill(MyProcPid, SIGURG);
-#endif
+		Latch	  **new_latches;
+		int			new_capacity;
+
+		/* Try to allocate more space. */
+		new_capacity = group->capacity * 2;
+		new_latches = palloc_extended(sizeof(latch) * new_capacity,
+									  MCXT_ALLOC_NO_OOM);
+		if (!new_latches)
+		{
+			/*
+			 * Allocation failed.  This is very unlikely to happen, but it
+			 * might be useful to be able to use this function in critical
+			 * sections, so we handle allocation failure by flushing instead
+			 * of throwing.
+			 */
+			SetLatches(group);
+		}
+		else
+		{
+			memcpy(new_latches, group->latches, sizeof(latch) * group->size);
+			if (group->latches != group->in_place_buffer)
+				pfree(group->latches);
+			group->latches = new_latches;
+			group->capacity = new_capacity;
+		}
 	}
-	else
-		kill(owner_pid, SIGURG);
 
-#else
+	Assert(group->size < group->capacity);
+	group->latches[group->size++] = latch;
+}
 
-	/*
-	 * See if anyone's waiting for the latch. It can be the current process if
-	 * we're in a signal handler.
-	 *
-	 * Use a local variable here just in case somebody changes the event field
-	 * concurrently (which really should not happen).
-	 */
-	handle = latch->event;
-	if (handle)
+/*
+ * Sets all the latches accumulated in the 'group'.
+ *
+ * Modifies the 'group'. The nature of this modification is
+ * an implementation detail of SetLatches(). The caller shouldn't try
+ * interpreting it and/or using the 'group' again after the call.
+ */
+void
+SetLatches(LatchGroup *group)
+{
+	if (group->size > 0)
 	{
-		SetEvent(handle);
+		SetLatchV(group->latches, group->size);
+		group->size = 0;
 
-		/*
-		 * Note that we silently ignore any errors. We might be in a signal
-		 * handler or other critical path where it's not safe to call elog().
-		 */
+		/* If we allocated a large buffer, it's time to free it. */
+		if (group->latches != group->in_place_buffer)
+		{
+			pfree(group->latches);
+			group->latches = group->in_place_buffer;
+			group->capacity = lengthof(group->in_place_buffer);
+		}
 	}
-#endif
 }
 
 /*
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 99cc47874a..7b81806825 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -118,6 +118,17 @@ typedef struct Latch
 #endif
 } Latch;
 
+/*
+ * A buffer for setting multiple latches at a time.
+ */
+typedef struct LatchGroup
+{
+	int			size;
+	int			capacity;
+	Latch	  **latches;
+	Latch	   *in_place_buffer[64];
+} LatchGroup;
+
 /*
  * Bitmasks for events that may wake-up WaitLatch(), WaitLatchOrSocket(), or
  * WaitEventSetWait().
@@ -170,6 +181,8 @@ extern void InitSharedLatch(Latch *latch);
 extern void OwnLatch(Latch *latch);
 extern void DisownLatch(Latch *latch);
 extern void SetLatch(Latch *latch);
+extern void AddLatch(LatchGroup *group, Latch *latch);
+extern void SetLatches(LatchGroup *group);
 extern void ResetLatch(Latch *latch);
 extern void ShutdownLatchSupport(void);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e941fb6c82..34f9b32b29 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1409,6 +1409,7 @@ LabelProvider
 LagTracker
 LargeObjectDesc
 Latch
+LatchGroup
 LauncherLastStartTimesEntry
 LerpFunc
 LexDescr
-- 
2.41.0

From c4c30181ef039107d0e03068a114032eb883fd05 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 27 Oct 2022 12:59:45 +1300
Subject: [PATCH v4 3/6] Use SetLatches() for synchronous replication wakeups.

Don't issue SetLatch() system calls while holding SyncRepLock.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/replication/syncrep.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 0ea71b5c43..f954c39b38 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -100,7 +100,7 @@ static int	SyncRepWaitMode = SYNC_REP_NO_WAIT;
 
 static void SyncRepQueueInsert(int mode);
 static void SyncRepCancelWait(void);
-static int	SyncRepWakeQueue(bool all, int mode);
+static int	SyncRepWakeQueue(bool all, int mode, LatchGroup *wakeups);
 
 static bool SyncRepGetSyncRecPtr(XLogRecPtr *writePtr,
 								 XLogRecPtr *flushPtr,
@@ -440,6 +440,7 @@ SyncRepReleaseWaiters(void)
 	int			numwrite = 0;
 	int			numflush = 0;
 	int			numapply = 0;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -509,21 +510,23 @@ SyncRepReleaseWaiters(void)
 	if (walsndctl->lsn[SYNC_REP_WAIT_WRITE] < writePtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_WRITE] = writePtr;
-		numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE);
+		numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE, &wakeups);
 	}
 	if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < flushPtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = flushPtr;
-		numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH);
+		numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH, &wakeups);
 	}
 	if (walsndctl->lsn[SYNC_REP_WAIT_APPLY] < applyPtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_APPLY] = applyPtr;
-		numapply = SyncRepWakeQueue(false, SYNC_REP_WAIT_APPLY);
+		numapply = SyncRepWakeQueue(false, SYNC_REP_WAIT_APPLY, &wakeups);
 	}
 
 	LWLockRelease(SyncRepLock);
 
+	SetLatches(&wakeups);
+
 	elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X, %d procs up to apply %X/%X",
 		 numwrite, LSN_FORMAT_ARGS(writePtr),
 		 numflush, LSN_FORMAT_ARGS(flushPtr),
@@ -867,7 +870,7 @@ SyncRepGetStandbyPriority(void)
  * The caller must hold SyncRepLock in exclusive mode.
  */
 static int
-SyncRepWakeQueue(bool all, int mode)
+SyncRepWakeQueue(bool all, int mode, LatchGroup *wakeups)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	int			numprocs = 0;
@@ -908,7 +911,7 @@ SyncRepWakeQueue(bool all, int mode)
 		/*
 		 * Wake only when we have set state and removed from queue.
 		 */
-		SetLatch(&(proc->procLatch));
+		AddLatch(wakeups, &proc->procLatch);
 
 		numprocs++;
 	}
@@ -930,6 +933,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 
 	if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
 	{
+		LatchGroup		wakeups = {0};
+
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 
 		/*
@@ -942,7 +947,7 @@ SyncRepUpdateSyncStandbysDefined(void)
 			int			i;
 
 			for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
-				SyncRepWakeQueue(true, i);
+				SyncRepWakeQueue(true, i, &wakeups);
 		}
 
 		/*
@@ -955,6 +960,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 		WalSndCtl->sync_standbys_defined = sync_standbys_defined;
 
 		LWLockRelease(SyncRepLock);
+
+		SetLatches(&wakeups);
 	}
 }
 
-- 
2.41.0

From 1dd76ab9bfadd070958a7c077ebffb5dde0afbdc Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 27 Oct 2022 12:40:12 +1300
Subject: [PATCH v4 4/6] Use SetLatches() for SERIALIZABLE DEFERRABLE wakeups.

Don't issue SetLatch()'s system call while holding a highly contended
LWLock.  Collect them in a LatchGroup to be set after the lock is
released.  Once woken, other backends will immediately try to acquire
that lock anyway so it's better to wake after releasing.

Take this opportunity to retire the confusingly named ProcSendSignal()
and replace it with something that gives the latch pointer we need.
There was only one other caller, in bufmgr.c, which is easily changed.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |  2 +-
 src/backend/storage/ipc/procsignal.c |  2 --
 src/backend/storage/lmgr/predicate.c |  5 ++++-
 src/backend/storage/lmgr/proc.c      | 12 ------------
 src/include/storage/proc.h           |  4 +++-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index a7e3b9bb1d..de5908876d 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -2449,7 +2449,7 @@ UnpinBuffer(BufferDesc *buf)
 
 				buf_state &= ~BM_PIN_COUNT_WAITER;
 				UnlockBufHdr(buf, buf_state);
-				ProcSendSignal(wait_backend_pgprocno);
+				SetLatch(GetProcLatchByNumber(wait_backend_pgprocno));
 			}
 			else
 				UnlockBufHdr(buf, buf_state);
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index c85cb5cc18..ec4c3888b0 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -255,8 +255,6 @@ CleanupProcSignalState(int status, Datum arg)
  *
  * On success (a signal was sent), zero is returned.
  * On error, -1 is returned, and errno is set (typically to ESRCH or EPERM).
- *
- * Not to be confused with ProcSendSignal
  */
 int
 SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 1af41213b4..4d95ca94df 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3249,6 +3249,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 	bool		needToClear;
 	SERIALIZABLEXACT *roXact;
 	dlist_mutable_iter iter;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * We can't trust XactReadOnly here, because a transaction which started
@@ -3553,7 +3554,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 			 */
 			if (SxactIsDeferrableWaiting(roXact) &&
 				(SxactIsROUnsafe(roXact) || SxactIsROSafe(roXact)))
-				ProcSendSignal(roXact->pgprocno);
+				AddLatch(&wakeups, GetProcLatchByNumber(roXact->pgprocno));
 		}
 	}
 
@@ -3583,6 +3584,8 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 
 	LWLockRelease(SerializableXactHashLock);
 
+	SetLatches(&wakeups);
+
 	LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
 
 	/* Add this to the list of transactions to check for later cleanup. */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 5b663a2997..a71b34ba0b 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1802,18 +1802,6 @@ ProcWaitForSignal(uint32 wait_event_info)
 	CHECK_FOR_INTERRUPTS();
 }
 
-/*
- * ProcSendSignal - set the latch of a backend identified by pgprocno
- */
-void
-ProcSendSignal(int pgprocno)
-{
-	if (pgprocno < 0 || pgprocno >= ProcGlobal->allProcCount)
-		elog(ERROR, "pgprocno out of range");
-
-	SetLatch(&ProcGlobal->allProcs[pgprocno].procLatch);
-}
-
 /*
  * BecomeLockGroupLeader - designate process as lock group leader
  *
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index ef74f32693..56a31d2a89 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -413,6 +413,9 @@ extern PGDLLIMPORT PGPROC *PreparedXactProcs;
 /* Accessor for PGPROC given a pgprocno. */
 #define GetPGProcByNumber(n) (&ProcGlobal->allProcs[(n)])
 
+/* Accessor for procLatch given a pgprocno. */
+#define GetProcLatchByNumber(n) (&(GetPGProcByNumber(n))->procLatch)
+
 /*
  * We set aside some extra PGPROC structures for auxiliary processes,
  * ie things that aren't full-fledged backends but need shmem access.
@@ -456,7 +459,6 @@ extern bool IsWaitingForLock(void);
 extern void LockErrorCleanup(void);
 
 extern void ProcWaitForSignal(uint32 wait_event_info);
-extern void ProcSendSignal(int pgprocno);
 
 extern PGPROC *AuxiliaryPidGetProc(int pid);
 
-- 
2.41.0

From f488319ab14d7698793b1b7e1b9c8098063b2294 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 26 Oct 2022 16:43:31 +1300
Subject: [PATCH v4 5/6] Use SetLatches() for heavyweight locks.

Collect wakeups into a LatchGroup to be set at once after the
LockManager's internal partition lock is released.  This avoids holding
busy locks while issuing system calls.

Currently, waiters immediately try to acquire that lock themselves, so
deferring until it's released makes sense to avoid contention (a later
patch may remove that lock acquisition though).

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/lmgr/deadlock.c |  4 ++--
 src/backend/storage/lmgr/lock.c     | 36 +++++++++++++++++++----------
 src/backend/storage/lmgr/proc.c     | 29 ++++++++++++++---------
 src/include/storage/lock.h          |  5 ++--
 src/include/storage/proc.h          |  6 ++---
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index 2bdd20b1fe..3657186d9f 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -214,7 +214,7 @@ InitDeadLockChecking(void)
  * and (b) we are typically invoked inside a signal handler.
  */
 DeadLockState
-DeadLockCheck(PGPROC *proc)
+DeadLockCheck(PGPROC *proc, LatchGroup *wakeups)
 {
 	/* Initialize to "no constraints" */
 	nCurConstraints = 0;
@@ -266,7 +266,7 @@ DeadLockCheck(PGPROC *proc)
 #endif
 
 		/* See if any waiters for the lock can be woken up now */
-		ProcLockWakeup(GetLocksMethodTable(lock), lock);
+		ProcLockWakeup(GetLocksMethodTable(lock), lock, wakeups);
 	}
 
 	/* Return code tells caller if we had to escape a deadlock or not */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index f595bce31b..126073a0d3 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -362,14 +362,14 @@ static PROCLOCK *SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner);
 static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode);
 static void FinishStrongLockAcquire(void);
-static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner);
+static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups);
 static void ReleaseLockIfHeld(LOCALLOCK *locallock, bool sessionLock);
 static void LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent);
 static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 						PROCLOCK *proclock, LockMethod lockMethodTable);
 static void CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 						LockMethod lockMethodTable, uint32 hashcode,
-						bool wakeupNeeded);
+						bool wakeupNeeded, LatchGroup *wakeups);
 static void LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 								 LOCKTAG *locktag, LOCKMODE lockmode,
 								 bool decrement_strong_lock_count);
@@ -775,6 +775,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	LWLock	   *partitionLock;
 	bool		found_conflict;
 	bool		log_lock = false;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -1079,7 +1080,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 										 locktag->locktag_type,
 										 lockmode);
 
-		WaitOnLock(locallock, owner);
+		WaitOnLock(locallock, owner, &wakeups);
 
 		TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
 										locktag->locktag_field2,
@@ -1119,6 +1120,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	/*
 	 * Emit a WAL record if acquisition of this lock needs to be replayed in a
 	 * standby server.
@@ -1605,7 +1608,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 static void
 CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 			LockMethod lockMethodTable, uint32 hashcode,
-			bool wakeupNeeded)
+			bool wakeupNeeded, LatchGroup *wakeups)
 {
 	/*
 	 * If this was my last hold on this lock, delete my entry in the proclock
@@ -1645,7 +1648,7 @@ CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 	else if (wakeupNeeded)
 	{
 		/* There are waiters on this lock, so wake them up. */
-		ProcLockWakeup(lockMethodTable, lock);
+		ProcLockWakeup(lockMethodTable, lock, wakeups);
 	}
 }
 
@@ -1782,7 +1785,7 @@ MarkLockClear(LOCALLOCK *locallock)
  * The appropriate partition lock must be held at entry.
  */
 static void
-WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
+WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
 {
 	LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock);
 	LockMethod	lockMethodTable = LockMethods[lockmethodid];
@@ -1815,7 +1818,7 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
 	 */
 	PG_TRY();
 	{
-		if (ProcSleep(locallock, lockMethodTable) != PROC_WAIT_STATUS_OK)
+		if (ProcSleep(locallock, lockMethodTable, wakeups) != PROC_WAIT_STATUS_OK)
 		{
 			/*
 			 * We failed as a result of a deadlock, see CheckDeadLock(). Quit
@@ -1866,7 +1869,7 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
  * NB: this does not clean up any locallock object that may exist for the lock.
  */
 void
-RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
+RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode, LatchGroup *wakeups)
 {
 	LOCK	   *waitLock = proc->waitLock;
 	PROCLOCK   *proclock = proc->waitProcLock;
@@ -1907,7 +1910,7 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
 	 */
 	CleanUpLock(waitLock, proclock,
 				LockMethods[lockmethodid], hashcode,
-				true);
+				true, wakeups);
 }
 
 /*
@@ -1932,6 +1935,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2110,10 +2114,12 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 
 	CleanUpLock(lock, proclock,
 				lockMethodTable, locallock->hashcode,
-				wakeupNeeded);
+				wakeupNeeded, &wakeups);
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	RemoveLocalLock(locallock);
 	return true;
 }
@@ -2137,6 +2143,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	LOCK	   *lock;
 	int			partition;
 	bool		have_fast_path_lwlock = false;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2375,10 +2382,12 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 			CleanUpLock(lock, proclock,
 						lockMethodTable,
 						LockTagHashCode(&lock->tag),
-						wakeupNeeded);
+						wakeupNeeded, &wakeups);
 		}						/* loop over PROCLOCKs within this partition */
 
 		LWLockRelease(partitionLock);
+
+		SetLatches(&wakeups);
 	}							/* loop over partitions */
 
 #ifdef LOCK_DEBUG
@@ -3071,6 +3080,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	uint32		proclock_hashcode;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	LatchGroup	wakeups = {0};
 
 	hashcode = LockTagHashCode(locktag);
 	partitionLock = LockHashPartitionLock(hashcode);
@@ -3124,10 +3134,12 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 
 	CleanUpLock(lock, proclock,
 				lockMethodTable, hashcode,
-				wakeupNeeded);
+				wakeupNeeded, &wakeups);
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	/*
 	 * Decrement strong lock count.  This logic is needed only for 2PC.
 	 */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index a71b34ba0b..8821517897 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -699,6 +699,7 @@ LockErrorCleanup(void)
 {
 	LWLock	   *partitionLock;
 	DisableTimeoutParams timeouts[2];
+	LatchGroup	wakeups = {0};
 
 	HOLD_INTERRUPTS();
 
@@ -732,7 +733,7 @@ LockErrorCleanup(void)
 	if (!dlist_node_is_detached(&MyProc->links))
 	{
 		/* We could not have been granted the lock yet */
-		RemoveFromWaitQueue(MyProc, lockAwaited->hashcode);
+		RemoveFromWaitQueue(MyProc, lockAwaited->hashcode, &wakeups);
 	}
 	else
 	{
@@ -749,6 +750,7 @@ LockErrorCleanup(void)
 	lockAwaited = NULL;
 
 	LWLockRelease(partitionLock);
+	SetLatches(&wakeups);
 
 	RESUME_INTERRUPTS();
 }
@@ -1001,7 +1003,7 @@ AuxiliaryPidGetProc(int pid)
  * NOTES: The process queue is now a priority queue for locking.
  */
 ProcWaitStatus
-ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
+ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 {
 	LOCKMODE	lockmode = locallock->tag.mode;
 	LOCK	   *lock = locallock->lock;
@@ -1137,7 +1139,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 */
 	if (early_deadlock)
 	{
-		RemoveFromWaitQueue(MyProc, hashcode);
+		RemoveFromWaitQueue(MyProc, hashcode, wakeups);
 		return PROC_WAIT_STATUS_ERROR;
 	}
 
@@ -1153,6 +1155,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 * LockErrorCleanup will clean up if cancel/die happens.
 	 */
 	LWLockRelease(partitionLock);
+	SetLatches(wakeups);
 
 	/*
 	 * Also, now that we will successfully clean up after an ereport, it's
@@ -1594,7 +1597,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 
 
 /*
- * ProcWakeup -- wake up a process by setting its latch.
+ * ProcWakeup -- prepare to wake up a process by setting its latch.
  *
  *	 Also remove the process from the wait queue and set its links invalid.
  *
@@ -1606,7 +1609,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
  * Hence, in practice the waitStatus parameter must be PROC_WAIT_STATUS_OK.
  */
 void
-ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
+ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus, LatchGroup *wakeups)
 {
 	if (dlist_node_is_detached(&proc->links))
 		return;
@@ -1622,8 +1625,8 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitStatus = waitStatus;
 	pg_atomic_write_u64(&MyProc->waitStart, 0);
 
-	/* And awaken it */
-	SetLatch(&proc->procLatch);
+	/* Schedule it to be awoken */
+	AddLatch(wakeups, &proc->procLatch);
 }
 
 /*
@@ -1634,7 +1637,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
  * The appropriate lock partition lock must be held by caller.
  */
 void
-ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
+ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock, LatchGroup *wakeups)
 {
 	dclist_head *waitQueue = &lock->waitProcs;
 	LOCKMASK	aheadRequests = 0;
@@ -1659,7 +1662,7 @@ ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
 			/* OK to waken */
 			GrantLock(lock, proc->waitProcLock, lockmode);
 			/* removes proc from the lock's waiting process queue */
-			ProcWakeup(proc, PROC_WAIT_STATUS_OK);
+			ProcWakeup(proc, PROC_WAIT_STATUS_OK, wakeups);
 		}
 		else
 		{
@@ -1685,6 +1688,7 @@ static void
 CheckDeadLock(void)
 {
 	int			i;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * Acquire exclusive lock on the entire shared lock data structures. Must
@@ -1719,7 +1723,7 @@ CheckDeadLock(void)
 #endif
 
 	/* Run the deadlock check, and set deadlock_state for use by ProcSleep */
-	deadlock_state = DeadLockCheck(MyProc);
+	deadlock_state = DeadLockCheck(MyProc, &wakeups);
 
 	if (deadlock_state == DS_HARD_DEADLOCK)
 	{
@@ -1736,7 +1740,8 @@ CheckDeadLock(void)
 		 * return from the signal handler.
 		 */
 		Assert(MyProc->waitLock != NULL);
-		RemoveFromWaitQueue(MyProc, LockTagHashCode(&(MyProc->waitLock->tag)));
+		RemoveFromWaitQueue(MyProc, LockTagHashCode(&(MyProc->waitLock->tag)),
+							&wakeups);
 
 		/*
 		 * We're done here.  Transaction abort caused by the error that
@@ -1760,6 +1765,8 @@ CheckDeadLock(void)
 check_done:
 	for (i = NUM_LOCK_PARTITIONS; --i >= 0;)
 		LWLockRelease(LockHashPartitionLockByIndex(i));
+
+	SetLatches(&wakeups);
 }
 
 /*
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 8575bea25c..b089bf341e 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -20,6 +20,7 @@
 
 #include "lib/ilist.h"
 #include "storage/backendid.h"
+#include "storage/latch.h"
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
@@ -583,7 +584,7 @@ extern bool LockCheckConflicts(LockMethod lockMethodTable,
 							   LOCK *lock, PROCLOCK *proclock);
 extern void GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode);
 extern void GrantAwaitedLock(void);
-extern void RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode);
+extern void RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode, LatchGroup *wakeups);
 extern Size LockShmemSize(void);
 extern LockData *GetLockStatusData(void);
 extern BlockedProcsData *GetBlockerStatusData(int blocked_pid);
@@ -600,7 +601,7 @@ extern void lock_twophase_postabort(TransactionId xid, uint16 info,
 extern void lock_twophase_standby_recover(TransactionId xid, uint16 info,
 										  void *recdata, uint32 len);
 
-extern DeadLockState DeadLockCheck(PGPROC *proc);
+extern DeadLockState DeadLockCheck(PGPROC *proc, LatchGroup *wakeups);
 extern PGPROC *GetBlockingAutoVacuumPgproc(void);
 extern void DeadLockReport(void) pg_attribute_noreturn();
 extern void RememberSimpleDeadLock(PGPROC *proc1,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 56a31d2a89..6944927222 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -451,9 +451,9 @@ extern int	GetStartupBufferPinWaitBufId(void);
 extern bool HaveNFreeProcs(int n, int *nfree);
 extern void ProcReleaseLocks(bool isCommit);
 
-extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable);
-extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus);
-extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock);
+extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups);
+extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus, LatchGroup *wakeups);
+extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock, LatchGroup *wakeups);
 extern void CheckDeadLockAlert(void);
 extern bool IsWaitingForLock(void);
 extern void LockErrorCleanup(void);
-- 
2.41.0

From fd907df99388e3694d7ea31d3ebe0dea8acbc742 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Tue, 1 Nov 2022 23:29:38 +1300
Subject: [PATCH v4 1/6] Allow palloc_extended(NO_OOM) in critical sections.

Commit 4a170ee9e0e banned palloc() and similar in critical sections, because an
allocation failure would produce a panic.  Make an exception for allocation
with NULL on failure, for code that has a backup plan.

Discussion: https://postgr.es/m/CA%2BhUKGJHudZMG_vh6GiPB61pE%2BGgiBk5jxzd7inijqx5nEZLCw%40mail.gmail.com
---
 src/backend/utils/mmgr/mcxt.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 9fc83f11f6..440c8e684e 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1139,7 +1139,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags)
 	void	   *ret;
 
 	Assert(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
 		  AllocSizeIsValid(size)))
@@ -1294,7 +1295,8 @@ palloc_extended(Size size, int flags)
 	MemoryContext context = CurrentMemoryContext;
 
 	Assert(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
 		  AllocSizeIsValid(size)))
@@ -1529,7 +1531,8 @@ repalloc_extended(void *pointer, Size size, int flags)
 		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	/* isReset must be false already */
 	Assert(!context->isReset);
-- 
2.41.0

From d43e82290028ef61e62d55627cc5e602dd20c27d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Thu, 27 Oct 2022 10:56:39 +1300
Subject: [PATCH v4 6/6] Don't re-acquire LockManager partition lock after
 wakeup.

Change the contract of WaitOnLock() and ProcSleep() to return with the
partition lock released.  ProcSleep() re-acquired the lock to prevent
interrupts, which was a problem at the time the ancestor of that code
was committed in fe548629c50, because then we had signal handlers that
longjmp'd out of there.  Now, that can happen only at points where we
explicitly run CHECK_FOR_INTERRUPTS(), so we can remove the lock, which
leads to the idea of making the function always release.

While an earlier commit fixed the problem that the backend woke us up
before it had even released the lock itself, this lock reacquisition
point was still contended when multiple backends woke at the same time.

XXX Right?  Or what am I missing?

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/lmgr/lock.c | 18 ++++++++++++------
 src/backend/storage/lmgr/proc.c | 20 ++++++++++++--------
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 126073a0d3..207abf93d0 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -775,6 +775,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	LWLock	   *partitionLock;
 	bool		found_conflict;
 	bool		log_lock = false;
+	bool		partitionLocked = false;
 	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -976,6 +977,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	partitionLock = LockHashPartitionLock(hashcode);
 
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
+	partitionLocked = true;
 
 	/*
 	 * Find or create lock and proclock entries with this tag
@@ -1080,7 +1082,10 @@ LockAcquireExtended(const LOCKTAG *locktag,
 										 locktag->locktag_type,
 										 lockmode);
 
+		Assert(LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
 		WaitOnLock(locallock, owner, &wakeups);
+		Assert(!LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
+		partitionLocked = false;
 
 		TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
 										locktag->locktag_field2,
@@ -1105,7 +1110,6 @@ LockAcquireExtended(const LOCKTAG *locktag,
 			PROCLOCK_PRINT("LockAcquire: INCONSISTENT", proclock);
 			LOCK_PRINT("LockAcquire: INCONSISTENT", lock, lockmode);
 			/* Should we retry ? */
-			LWLockRelease(partitionLock);
 			elog(ERROR, "LockAcquire failed");
 		}
 		PROCLOCK_PRINT("LockAcquire: granted", proclock);
@@ -1118,9 +1122,11 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 */
 	FinishStrongLockAcquire();
 
-	LWLockRelease(partitionLock);
-
-	SetLatches(&wakeups);
+	if (partitionLocked)
+	{
+		LWLockRelease(partitionLock);
+		SetLatches(&wakeups);
+	}
 
 	/*
 	 * Emit a WAL record if acquisition of this lock needs to be replayed in a
@@ -1782,7 +1788,8 @@ MarkLockClear(LOCALLOCK *locallock)
  * Caller must have set MyProc->heldLocks to reflect locks already held
  * on the lockable object by this process.
  *
- * The appropriate partition lock must be held at entry.
+ * The appropriate partition lock must be held at entry.  It is not held on
+ * exit.
  */
 static void
 WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
@@ -1827,7 +1834,6 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
 			awaitedLock = NULL;
 			LOCK_PRINT("WaitOnLock: aborting on lock",
 					   locallock->lock, locallock->tag.mode);
-			LWLockRelease(LockHashPartitionLock(locallock->hashcode));
 
 			/*
 			 * Now that we aren't holding the partition lock, we can give an
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 8821517897..2c5a6ebe83 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -992,7 +992,7 @@ AuxiliaryPidGetProc(int pid)
  * Caller must have set MyProc->heldLocks to reflect locks already held
  * on the lockable object by this process (under all XIDs).
  *
- * The lock table's partition lock must be held at entry, and will be held
+ * The lock table's partition lock must be held at entry, and will be released
  * at exit.
  *
  * Result: PROC_WAIT_STATUS_OK if we acquired the lock, PROC_WAIT_STATUS_ERROR if not (deadlock).
@@ -1020,6 +1020,12 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 	ProcWaitStatus myWaitStatus;
 	PGPROC	   *leader = MyProc->lockGroupLeader;
 
+	/*
+	 * Every way out of this function will release the partition lock and send
+	 * buffered latch wakeups.
+	 */
+	Assert(LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
+
 	/*
 	 * If group locking is in use, locks held by members of my locking group
 	 * need to be included in myHeldLocks.  This is not required for relation
@@ -1104,6 +1110,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 					/* Skip the wait and just grant myself the lock. */
 					GrantLock(lock, proclock, lockmode);
 					GrantAwaitedLock();
+					LWLockRelease(partitionLock);
+					SetLatches(wakeups);
 					return PROC_WAIT_STATUS_OK;
 				}
 
@@ -1140,6 +1148,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 	if (early_deadlock)
 	{
 		RemoveFromWaitQueue(MyProc, hashcode, wakeups);
+		LWLockRelease(partitionLock);
+		SetLatches(wakeups);
 		return PROC_WAIT_STATUS_ERROR;
 	}
 
@@ -1469,6 +1479,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 			}
 
 			LWLockRelease(partitionLock);
+			SetLatches(wakeups);
 
 			if (deadlock_state == DS_SOFT_DEADLOCK)
 				ereport(LOG,
@@ -1570,13 +1581,6 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 							standbyWaitStart, GetCurrentTimestamp(),
 							NULL, false);
 
-	/*
-	 * Re-acquire the lock table's partition lock.  We have to do this to hold
-	 * off cancel/die interrupts before we can mess with lockAwaited (else we
-	 * might have a missed or duplicated locallock update).
-	 */
-	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
-
 	/*
 	 * We no longer want LockErrorCleanup to do anything.
 	 */
-- 
2.41.0

Re: Latches vs lwlock contention

Reply via email to