Re: Latches vs lwlock contention

Thomas Munro Sat, 04 Mar 2023 11:51:30 -0800

On Sat, Jan 28, 2023 at 3:39 AM vignesh C <vignes...@gmail.com> wrote:
> On Tue, 1 Nov 2022 at 16:40, Thomas Munro <thomas.mu...@gmail.com> wrote:
> > Here's an attempt at that.  There aren't actually any cases of uses of
> > this stuff in critical sections here, so perhaps I shouldn't bother
> > with that part.  The part I'd most like some feedback on is the
> > heavyweight lock bits.  I'll add this to the commitfest.
>
> The patch does not apply on top of HEAD as in [1], please post a rebased 
> patch:


Rebased.  I dropped the CV patch for now.

From d678e740c61ed5408ad254d2ca5f8e2fff7d2cd1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 1 Nov 2022 23:29:38 +1300
Subject: [PATCH v3 1/6] Allow palloc_extended(NO_OOM) in critical sections.

Commit 4a170ee9e0e banned palloc() and similar in critical sections, because an
allocation failure would produce a panic.  Make an exception for allocation
with NULL on failure, for code that has a backup plan.

Discussion: https://postgr.es/m/CA%2BhUKGJHudZMG_vh6GiPB61pE%2BGgiBk5jxzd7inijqx5nEZLCw%40mail.gmail.com
---
 src/backend/utils/mmgr/mcxt.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c
index 0b00802df7..eb18b89368 100644
--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -1123,7 +1123,8 @@ MemoryContextAllocExtended(MemoryContext context, Size size, int flags)
 	void	   *ret;
 
 	Assert(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
 		  AllocSizeIsValid(size)))
@@ -1278,7 +1279,8 @@ palloc_extended(Size size, int flags)
 	MemoryContext context = CurrentMemoryContext;
 
 	Assert(MemoryContextIsValid(context));
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	if (!((flags & MCXT_ALLOC_HUGE) != 0 ? AllocHugeSizeIsValid(size) :
 		  AllocSizeIsValid(size)))
@@ -1509,7 +1511,8 @@ repalloc_extended(void *pointer, Size size, int flags)
 		  AllocSizeIsValid(size)))
 		elog(ERROR, "invalid memory alloc request size %zu", size);
 
-	AssertNotInCriticalSection(context);
+	if ((flags & MCXT_ALLOC_NO_OOM) == 0)
+		AssertNotInCriticalSection(context);
 
 	/* isReset must be false already */
 	Assert(!context->isReset);
-- 
2.39.2

From a3917505d7bd7cec7e98010d05bc016ddcd0dbac Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 26 Oct 2022 15:51:45 +1300
Subject: [PATCH v3 2/6] Provide SetLatches() for batched deferred latches.

If we have a way to buffer a set of wakeup targets and process them at a
later time, we can:

* move SetLatch() system calls out from under LWLocks, so that locks can
  be released faster; this is especially interesting in cases where the
  target backends will immediately try to acquire the same lock, or
  generally when the lock is heavily contended

* possibly gain some micro-opimization from issuing only two memory
  barriers for the whole batch of latches, not two for each latch to be
  set

* prepare for future IPC mechanisms which might allow us to wake
  multiple backends in a single system call

Individual users of this facility will follow in separate patches.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/ipc/latch.c  | 214 ++++++++++++++++++++-----------
 src/include/storage/latch.h      |  13 ++
 src/tools/pgindent/typedefs.list |   1 +
 3 files changed, 154 insertions(+), 74 deletions(-)

diff --git a/src/backend/storage/ipc/latch.c b/src/backend/storage/ipc/latch.c
index f4123e7de7..96dd47575a 100644
--- a/src/backend/storage/ipc/latch.c
+++ b/src/backend/storage/ipc/latch.c
@@ -591,6 +591,85 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 	return ret;
 }
 
+/*
+ * Set multiple latches at the same time.
+ * Note: modifies input array.
+ */
+static void
+SetLatchV(Latch **latches, int nlatches)
+{
+	/* Flush any other changes out to main memory just once. */
+	pg_memory_barrier();
+
+	/* Keep only latches that are not already set, and set them. */
+	for (int i = 0; i < nlatches; ++i)
+	{
+		Latch	   *latch = latches[i];
+
+		if (!latch->is_set)
+			latch->is_set = true;
+		else
+			latches[i] = NULL;
+	}
+
+	pg_memory_barrier();
+
+	/* Wake the remaining latches that might be sleeping. */
+	for (int i = 0; i < nlatches; ++i)
+	{
+		Latch	   *latch = latches[i];
+
+		/*
+		 * See if anyone's waiting for the latch. It can be the current
+		 * process if we're in a signal handler. We use the self-pipe or
+		 * SIGURG to ourselves to wake up WaitEventSetWaitBlock() without
+		 * races in that case. If it's another process, send a signal.
+		 *
+		 * Fetch owner_pid only once, in case the latch is concurrently
+		 * getting owned or disowned. XXX: This assumes that pid_t is atomic,
+		 * which isn't guaranteed to be true! In practice, the effective range
+		 * of pid_t fits in a 32 bit integer, and so should be atomic. In the
+		 * worst case, we might end up signaling the wrong process. Even then,
+		 * you're very unlucky if a process with that bogus pid exists and
+		 * belongs to Postgres; and PG database processes should handle excess
+		 * SIGURG interrupts without a problem anyhow.
+		 *
+		 * Another sort of race condition that's possible here is for a new
+		 * process to own the latch immediately after we look, so we don't
+		 * signal it. This is okay so long as all callers of
+		 * ResetLatch/WaitLatch follow the standard coding convention of
+		 * waiting at the bottom of their loops, not the top, so that they'll
+		 * correctly process latch-setting events that happen before they
+		 * enter the loop.
+		 */
+		if (latch && latch->maybe_sleeping)
+		{
+#ifndef WIN32
+			pid_t		owner_pid = latch->owner_pid;
+
+			if (owner_pid == MyProcPid)
+			{
+				if (waiting)
+				{
+#if defined(WAIT_USE_SELF_PIPE)
+					sendSelfPipeByte();
+#else
+					kill(MyProcPid, SIGURG);
+#endif
+				}
+			}
+			else
+				kill(owner_pid, SIGURG);
+#else
+			HANDLE		event = latch->event;
+
+			if (event)
+				SetEvent(event);
+#endif
+		}
+	}
+}
+
 /*
  * Sets a latch and wakes up anyone waiting on it.
  *
@@ -606,89 +685,76 @@ WaitLatchOrSocket(Latch *latch, int wakeEvents, pgsocket sock,
 void
 SetLatch(Latch *latch)
 {
-#ifndef WIN32
-	pid_t		owner_pid;
-#else
-	HANDLE		handle;
-#endif
-
-	/*
-	 * The memory barrier has to be placed here to ensure that any flag
-	 * variables possibly changed by this process have been flushed to main
-	 * memory, before we check/set is_set.
-	 */
-	pg_memory_barrier();
-
-	/* Quick exit if already set */
-	if (latch->is_set)
-		return;
-
-	latch->is_set = true;
-
-	pg_memory_barrier();
-	if (!latch->maybe_sleeping)
-		return;
+	SetLatchV(&latch, 1);
+}
 
-#ifndef WIN32
+/*
+ * Add a latch to a group, to be set later.
+ */
+void
+AddLatch(LatchGroup *group, Latch *latch)
+{
+	/* First time.  Set up the in-place buffer. */
+	if (!group->latches)
+	{
+		group->latches = group->in_place_buffer;
+		group->capacity = lengthof(group->in_place_buffer);
+		Assert(group->size == 0);
+	}
 
-	/*
-	 * See if anyone's waiting for the latch. It can be the current process if
-	 * we're in a signal handler. We use the self-pipe or SIGURG to ourselves
-	 * to wake up WaitEventSetWaitBlock() without races in that case. If it's
-	 * another process, send a signal.
-	 *
-	 * Fetch owner_pid only once, in case the latch is concurrently getting
-	 * owned or disowned. XXX: This assumes that pid_t is atomic, which isn't
-	 * guaranteed to be true! In practice, the effective range of pid_t fits
-	 * in a 32 bit integer, and so should be atomic. In the worst case, we
-	 * might end up signaling the wrong process. Even then, you're very
-	 * unlucky if a process with that bogus pid exists and belongs to
-	 * Postgres; and PG database processes should handle excess SIGUSR1
-	 * interrupts without a problem anyhow.
-	 *
-	 * Another sort of race condition that's possible here is for a new
-	 * process to own the latch immediately after we look, so we don't signal
-	 * it. This is okay so long as all callers of ResetLatch/WaitLatch follow
-	 * the standard coding convention of waiting at the bottom of their loops,
-	 * not the top, so that they'll correctly process latch-setting events
-	 * that happen before they enter the loop.
-	 */
-	owner_pid = latch->owner_pid;
-	if (owner_pid == 0)
-		return;
-	else if (owner_pid == MyProcPid)
+	/* Are we full? */
+	if (group->size == group->capacity)
 	{
-#if defined(WAIT_USE_SELF_PIPE)
-		if (waiting)
-			sendSelfPipeByte();
-#else
-		if (waiting)
-			kill(MyProcPid, SIGURG);
-#endif
+		Latch	  **new_latches;
+		int			new_capacity;
+
+		/* Try to allocate more space. */
+		new_capacity = group->capacity * 2;
+		new_latches = palloc_extended(sizeof(latch) * new_capacity,
+									  MCXT_ALLOC_NO_OOM);
+		if (!new_latches)
+		{
+			/*
+			 * Allocation failed.  This is very unlikely to happen, but it
+			 * might be useful to be able to use this function in critical
+			 * sections, so we handle allocation failure by flushing instead
+			 * of throwing.
+			 */
+			SetLatches(group);
+		}
+		else
+		{
+			memcpy(new_latches, group->latches, sizeof(latch) * group->size);
+			if (group->latches != group->in_place_buffer)
+				pfree(group->latches);
+			group->latches = new_latches;
+			group->capacity = new_capacity;
+		}
 	}
-	else
-		kill(owner_pid, SIGURG);
 
-#else
+	Assert(group->size < group->capacity);
+	group->latches[group->size++] = latch;
+}
 
-	/*
-	 * See if anyone's waiting for the latch. It can be the current process if
-	 * we're in a signal handler.
-	 *
-	 * Use a local variable here just in case somebody changes the event field
-	 * concurrently (which really should not happen).
-	 */
-	handle = latch->event;
-	if (handle)
+/*
+ * Set all the latches accumulated in 'group'.
+ */
+void
+SetLatches(LatchGroup *group)
+{
+	if (group->size > 0)
 	{
-		SetEvent(handle);
+		SetLatchV(group->latches, group->size);
+		group->size = 0;
 
-		/*
-		 * Note that we silently ignore any errors. We might be in a signal
-		 * handler or other critical path where it's not safe to call elog().
-		 */
+		/* If we allocated a large buffer, it's time to free it. */
+		if (group->latches != group->in_place_buffer)
+		{
+			pfree(group->latches);
+			group->latches = group->in_place_buffer;
+			group->capacity = lengthof(group->in_place_buffer);
+		}
 	}
-#endif
 }
 
 /*
diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index 99cc47874a..7b81806825 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -118,6 +118,17 @@ typedef struct Latch
 #endif
 } Latch;
 
+/*
+ * A buffer for setting multiple latches at a time.
+ */
+typedef struct LatchGroup
+{
+	int			size;
+	int			capacity;
+	Latch	  **latches;
+	Latch	   *in_place_buffer[64];
+} LatchGroup;
+
 /*
  * Bitmasks for events that may wake-up WaitLatch(), WaitLatchOrSocket(), or
  * WaitEventSetWait().
@@ -170,6 +181,8 @@ extern void InitSharedLatch(Latch *latch);
 extern void OwnLatch(Latch *latch);
 extern void DisownLatch(Latch *latch);
 extern void SetLatch(Latch *latch);
+extern void AddLatch(LatchGroup *group, Latch *latch);
+extern void SetLatches(LatchGroup *group);
 extern void ResetLatch(Latch *latch);
 extern void ShutdownLatchSupport(void);
 
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 86a9303bf5..a5bf7d699d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1399,6 +1399,7 @@ LagTracker
 LargeObjectDesc
 LastAttnumInfo
 Latch
+LatchGroup
 LerpFunc
 LexDescr
 LexemeEntry
-- 
2.39.2

From 631100e021206afe23a6801737fe57fae3a356e0 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 27 Oct 2022 12:59:45 +1300
Subject: [PATCH v3 3/6] Use SetLatches() for synchronous replication wakeups.

Don't issue SetLatch() system calls while holding SyncRepLock.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/replication/syncrep.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 889e20b5dd..446a9e1239 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -100,7 +100,7 @@ static int	SyncRepWaitMode = SYNC_REP_NO_WAIT;
 
 static void SyncRepQueueInsert(int mode);
 static void SyncRepCancelWait(void);
-static int	SyncRepWakeQueue(bool all, int mode);
+static int	SyncRepWakeQueue(bool all, int mode, LatchGroup *wakeups);
 
 static bool SyncRepGetSyncRecPtr(XLogRecPtr *writePtr,
 								 XLogRecPtr *flushPtr,
@@ -440,6 +440,7 @@ SyncRepReleaseWaiters(void)
 	int			numwrite = 0;
 	int			numflush = 0;
 	int			numapply = 0;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -509,21 +510,23 @@ SyncRepReleaseWaiters(void)
 	if (walsndctl->lsn[SYNC_REP_WAIT_WRITE] < writePtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_WRITE] = writePtr;
-		numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE);
+		numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE, &wakeups);
 	}
 	if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < flushPtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = flushPtr;
-		numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH);
+		numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH, &wakeups);
 	}
 	if (walsndctl->lsn[SYNC_REP_WAIT_APPLY] < applyPtr)
 	{
 		walsndctl->lsn[SYNC_REP_WAIT_APPLY] = applyPtr;
-		numapply = SyncRepWakeQueue(false, SYNC_REP_WAIT_APPLY);
+		numapply = SyncRepWakeQueue(false, SYNC_REP_WAIT_APPLY, &wakeups);
 	}
 
 	LWLockRelease(SyncRepLock);
 
+	SetLatches(&wakeups);
+
 	elog(DEBUG3, "released %d procs up to write %X/%X, %d procs up to flush %X/%X, %d procs up to apply %X/%X",
 		 numwrite, LSN_FORMAT_ARGS(writePtr),
 		 numflush, LSN_FORMAT_ARGS(flushPtr),
@@ -867,7 +870,7 @@ SyncRepGetStandbyPriority(void)
  * The caller must hold SyncRepLock in exclusive mode.
  */
 static int
-SyncRepWakeQueue(bool all, int mode)
+SyncRepWakeQueue(bool all, int mode, LatchGroup *wakeups)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	int			numprocs = 0;
@@ -908,7 +911,7 @@ SyncRepWakeQueue(bool all, int mode)
 		/*
 		 * Wake only when we have set state and removed from queue.
 		 */
-		SetLatch(&(proc->procLatch));
+		AddLatch(wakeups, &proc->procLatch);
 
 		numprocs++;
 	}
@@ -930,6 +933,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 
 	if (sync_standbys_defined != WalSndCtl->sync_standbys_defined)
 	{
+		LatchGroup		wakeups = {0};
+
 		LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
 
 		/*
@@ -942,7 +947,7 @@ SyncRepUpdateSyncStandbysDefined(void)
 			int			i;
 
 			for (i = 0; i < NUM_SYNC_REP_WAIT_MODE; i++)
-				SyncRepWakeQueue(true, i);
+				SyncRepWakeQueue(true, i, &wakeups);
 		}
 
 		/*
@@ -955,6 +960,8 @@ SyncRepUpdateSyncStandbysDefined(void)
 		WalSndCtl->sync_standbys_defined = sync_standbys_defined;
 
 		LWLockRelease(SyncRepLock);
+
+		SetLatches(&wakeups);
 	}
 }
 
-- 
2.39.2

From 0c31264ae68785095d7678e5ee60f0378dae5b93 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 27 Oct 2022 12:40:12 +1300
Subject: [PATCH v3 4/6] Use SetLatches() for SERIALIZABLE DEFERRABLE wakeups.

Don't issue SetLatch()'s system call while holding a highly contended
LWLock.  Collect them in a LatchGroup to be set after the lock is
released.  Once woken, other backends will immediately try to acquire
that lock anyway so it's better to wake after releasing.

Take this opportunity to retire the confusingly named ProcSendSignal()
and replace it with something that gives the latch pointer we need.
There was only one other caller, in bufmgr.c, which is easily changed.

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/buffer/bufmgr.c  |  2 +-
 src/backend/storage/ipc/procsignal.c |  2 --
 src/backend/storage/lmgr/predicate.c |  5 ++++-
 src/backend/storage/lmgr/proc.c      | 12 ------------
 src/include/storage/proc.h           |  4 +++-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 0a05577b68..f297958f1b 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -1966,7 +1966,7 @@ UnpinBuffer(BufferDesc *buf)
 
 				buf_state &= ~BM_PIN_COUNT_WAITER;
 				UnlockBufHdr(buf, buf_state);
-				ProcSendSignal(wait_backend_pgprocno);
+				SetLatch(GetProcLatchByNumber(wait_backend_pgprocno));
 			}
 			else
 				UnlockBufHdr(buf, buf_state);
diff --git a/src/backend/storage/ipc/procsignal.c b/src/backend/storage/ipc/procsignal.c
index 395b2cf690..13ec2317fc 100644
--- a/src/backend/storage/ipc/procsignal.c
+++ b/src/backend/storage/ipc/procsignal.c
@@ -255,8 +255,6 @@ CleanupProcSignalState(int status, Datum arg)
  *
  * On success (a signal was sent), zero is returned.
  * On error, -1 is returned, and errno is set (typically to ESRCH or EPERM).
- *
- * Not to be confused with ProcSendSignal
  */
 int
 SendProcSignal(pid_t pid, ProcSignalReason reason, BackendId backendId)
diff --git a/src/backend/storage/lmgr/predicate.c b/src/backend/storage/lmgr/predicate.c
index 0f7f7ba5f1..86b046999b 100644
--- a/src/backend/storage/lmgr/predicate.c
+++ b/src/backend/storage/lmgr/predicate.c
@@ -3235,6 +3235,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 	bool		needToClear;
 	SERIALIZABLEXACT *roXact;
 	dlist_mutable_iter iter;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * We can't trust XactReadOnly here, because a transaction which started
@@ -3538,7 +3539,7 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 			 */
 			if (SxactIsDeferrableWaiting(roXact) &&
 				(SxactIsROUnsafe(roXact) || SxactIsROSafe(roXact)))
-				ProcSendSignal(roXact->pgprocno);
+				AddLatch(&wakeups, GetProcLatchByNumber(roXact->pgprocno));
 		}
 	}
 
@@ -3562,6 +3563,8 @@ ReleasePredicateLocks(bool isCommit, bool isReadOnlySafe)
 
 	LWLockRelease(SerializableXactHashLock);
 
+	SetLatches(&wakeups);
+
 	LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE);
 
 	/* Add this to the list of transactions to check for later cleanup. */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 22b4278610..09eda65fd1 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -1802,18 +1802,6 @@ ProcWaitForSignal(uint32 wait_event_info)
 	CHECK_FOR_INTERRUPTS();
 }
 
-/*
- * ProcSendSignal - set the latch of a backend identified by pgprocno
- */
-void
-ProcSendSignal(int pgprocno)
-{
-	if (pgprocno < 0 || pgprocno >= ProcGlobal->allProcCount)
-		elog(ERROR, "pgprocno out of range");
-
-	SetLatch(&ProcGlobal->allProcs[pgprocno].procLatch);
-}
-
 /*
  * BecomeLockGroupLeader - designate process as lock group leader
  *
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 4258cd92c9..ca39ae9486 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -413,6 +413,9 @@ extern PGDLLIMPORT PGPROC *PreparedXactProcs;
 /* Accessor for PGPROC given a pgprocno. */
 #define GetPGProcByNumber(n) (&ProcGlobal->allProcs[(n)])
 
+/* Accessor for procLatch given a pgprocno. */
+#define GetProcLatchByNumber(n) (&(GetPGProcByNumber(n))->procLatch)
+
 /*
  * We set aside some extra PGPROC structures for auxiliary processes,
  * ie things that aren't full-fledged backends but need shmem access.
@@ -456,7 +459,6 @@ extern bool IsWaitingForLock(void);
 extern void LockErrorCleanup(void);
 
 extern void ProcWaitForSignal(uint32 wait_event_info);
-extern void ProcSendSignal(int pgprocno);
 
 extern PGPROC *AuxiliaryPidGetProc(int pid);
 
-- 
2.39.2

From 4fea49d303dfc55edef7dc61350a6ae4f160c6a7 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Wed, 26 Oct 2022 16:43:31 +1300
Subject: [PATCH v3 5/6] Use SetLatches() for heavyweight locks.

Collect wakeups into a LatchGroup to be set at once after the
LockManager's internal partition lock is released.  This avoids holding
busy locks while issuing system calls.

Currently, waiters immediately try to acquire that lock themselves, so
deferring until it's released makes sense to avoid contention (a later
patch may remove that lock acquisition though).

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/lmgr/deadlock.c |  4 ++--
 src/backend/storage/lmgr/lock.c     | 36 +++++++++++++++++++----------
 src/backend/storage/lmgr/proc.c     | 29 ++++++++++++++---------
 src/include/storage/lock.h          |  5 ++--
 src/include/storage/proc.h          |  6 ++---
 5 files changed, 50 insertions(+), 30 deletions(-)

diff --git a/src/backend/storage/lmgr/deadlock.c b/src/backend/storage/lmgr/deadlock.c
index 617ebacdd4..63192647c8 100644
--- a/src/backend/storage/lmgr/deadlock.c
+++ b/src/backend/storage/lmgr/deadlock.c
@@ -214,7 +214,7 @@ InitDeadLockChecking(void)
  * and (b) we are typically invoked inside a signal handler.
  */
 DeadLockState
-DeadLockCheck(PGPROC *proc)
+DeadLockCheck(PGPROC *proc, LatchGroup *wakeups)
 {
 	/* Initialize to "no constraints" */
 	nCurConstraints = 0;
@@ -266,7 +266,7 @@ DeadLockCheck(PGPROC *proc)
 #endif
 
 		/* See if any waiters for the lock can be woken up now */
-		ProcLockWakeup(GetLocksMethodTable(lock), lock);
+		ProcLockWakeup(GetLocksMethodTable(lock), lock, wakeups);
 	}
 
 	/* Return code tells caller if we had to escape a deadlock or not */
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 42595b38b2..fec6e52145 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -374,14 +374,14 @@ static PROCLOCK *SetupLockInTable(LockMethod lockMethodTable, PGPROC *proc,
 static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner);
 static void BeginStrongLockAcquire(LOCALLOCK *locallock, uint32 fasthashcode);
 static void FinishStrongLockAcquire(void);
-static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner);
+static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups);
 static void ReleaseLockIfHeld(LOCALLOCK *locallock, bool sessionLock);
 static void LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent);
 static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 						PROCLOCK *proclock, LockMethod lockMethodTable);
 static void CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 						LockMethod lockMethodTable, uint32 hashcode,
-						bool wakeupNeeded);
+						bool wakeupNeeded, LatchGroup *wakeups);
 static void LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 								 LOCKTAG *locktag, LOCKMODE lockmode,
 								 bool decrement_strong_lock_count);
@@ -787,6 +787,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	LWLock	   *partitionLock;
 	bool		found_conflict;
 	bool		log_lock = false;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -1098,7 +1099,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 										 locktag->locktag_type,
 										 lockmode);
 
-		WaitOnLock(locallock, owner);
+		WaitOnLock(locallock, owner, &wakeups);
 
 		TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
 										locktag->locktag_field2,
@@ -1138,6 +1139,8 @@ LockAcquireExtended(const LOCKTAG *locktag,
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	/*
 	 * Emit a WAL record if acquisition of this lock needs to be replayed in a
 	 * standby server.
@@ -1629,7 +1632,7 @@ UnGrantLock(LOCK *lock, LOCKMODE lockmode,
 static void
 CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 			LockMethod lockMethodTable, uint32 hashcode,
-			bool wakeupNeeded)
+			bool wakeupNeeded, LatchGroup *wakeups)
 {
 	/*
 	 * If this was my last hold on this lock, delete my entry in the proclock
@@ -1669,7 +1672,7 @@ CleanUpLock(LOCK *lock, PROCLOCK *proclock,
 	else if (wakeupNeeded)
 	{
 		/* There are waiters on this lock, so wake them up. */
-		ProcLockWakeup(lockMethodTable, lock);
+		ProcLockWakeup(lockMethodTable, lock, wakeups);
 	}
 }
 
@@ -1806,7 +1809,7 @@ MarkLockClear(LOCALLOCK *locallock)
  * The appropriate partition lock must be held at entry.
  */
 static void
-WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
+WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
 {
 	LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock);
 	LockMethod	lockMethodTable = LockMethods[lockmethodid];
@@ -1839,7 +1842,7 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
 	 */
 	PG_TRY();
 	{
-		if (ProcSleep(locallock, lockMethodTable) != PROC_WAIT_STATUS_OK)
+		if (ProcSleep(locallock, lockMethodTable, wakeups) != PROC_WAIT_STATUS_OK)
 		{
 			/*
 			 * We failed as a result of a deadlock, see CheckDeadLock(). Quit
@@ -1890,7 +1893,7 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner)
  * NB: this does not clean up any locallock object that may exist for the lock.
  */
 void
-RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
+RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode, LatchGroup *wakeups)
 {
 	LOCK	   *waitLock = proc->waitLock;
 	PROCLOCK   *proclock = proc->waitProcLock;
@@ -1931,7 +1934,7 @@ RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode)
 	 */
 	CleanUpLock(waitLock, proclock,
 				LockMethods[lockmethodid], hashcode,
-				true);
+				true, wakeups);
 }
 
 /*
@@ -1956,6 +1959,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2134,10 +2138,12 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 
 	CleanUpLock(lock, proclock,
 				lockMethodTable, locallock->hashcode,
-				wakeupNeeded);
+				wakeupNeeded, &wakeups);
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	RemoveLocalLock(locallock);
 	return true;
 }
@@ -2161,6 +2167,7 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 	LOCK	   *lock;
 	int			partition;
 	bool		have_fast_path_lwlock = false;
+	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2399,10 +2406,12 @@ LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks)
 			CleanUpLock(lock, proclock,
 						lockMethodTable,
 						LockTagHashCode(&lock->tag),
-						wakeupNeeded);
+						wakeupNeeded, &wakeups);
 		}						/* loop over PROCLOCKs within this partition */
 
 		LWLockRelease(partitionLock);
+
+		SetLatches(&wakeups);
 	}							/* loop over partitions */
 
 #ifdef LOCK_DEBUG
@@ -3095,6 +3104,7 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 	uint32		proclock_hashcode;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	LatchGroup	wakeups = {0};
 
 	hashcode = LockTagHashCode(locktag);
 	partitionLock = LockHashPartitionLock(hashcode);
@@ -3148,10 +3158,12 @@ LockRefindAndRelease(LockMethod lockMethodTable, PGPROC *proc,
 
 	CleanUpLock(lock, proclock,
 				lockMethodTable, hashcode,
-				wakeupNeeded);
+				wakeupNeeded, &wakeups);
 
 	LWLockRelease(partitionLock);
 
+	SetLatches(&wakeups);
+
 	/*
 	 * Decrement strong lock count.  This logic is needed only for 2PC.
 	 */
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 09eda65fd1..156d293210 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -699,6 +699,7 @@ LockErrorCleanup(void)
 {
 	LWLock	   *partitionLock;
 	DisableTimeoutParams timeouts[2];
+	LatchGroup	wakeups = {0};
 
 	HOLD_INTERRUPTS();
 
@@ -732,7 +733,7 @@ LockErrorCleanup(void)
 	if (!dlist_node_is_detached(&MyProc->links))
 	{
 		/* We could not have been granted the lock yet */
-		RemoveFromWaitQueue(MyProc, lockAwaited->hashcode);
+		RemoveFromWaitQueue(MyProc, lockAwaited->hashcode, &wakeups);
 	}
 	else
 	{
@@ -749,6 +750,7 @@ LockErrorCleanup(void)
 	lockAwaited = NULL;
 
 	LWLockRelease(partitionLock);
+	SetLatches(&wakeups);
 
 	RESUME_INTERRUPTS();
 }
@@ -1001,7 +1003,7 @@ AuxiliaryPidGetProc(int pid)
  * NOTES: The process queue is now a priority queue for locking.
  */
 ProcWaitStatus
-ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
+ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 {
 	LOCKMODE	lockmode = locallock->tag.mode;
 	LOCK	   *lock = locallock->lock;
@@ -1137,7 +1139,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 */
 	if (early_deadlock)
 	{
-		RemoveFromWaitQueue(MyProc, hashcode);
+		RemoveFromWaitQueue(MyProc, hashcode, wakeups);
 		return PROC_WAIT_STATUS_ERROR;
 	}
 
@@ -1153,6 +1155,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 	 * LockErrorCleanup will clean up if cancel/die happens.
 	 */
 	LWLockRelease(partitionLock);
+	SetLatches(wakeups);
 
 	/*
 	 * Also, now that we will successfully clean up after an ereport, it's
@@ -1594,7 +1597,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
 
 
 /*
- * ProcWakeup -- wake up a process by setting its latch.
+ * ProcWakeup -- prepare to wake up a process by setting its latch.
  *
  *	 Also remove the process from the wait queue and set its links invalid.
  *
@@ -1606,7 +1609,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable)
  * Hence, in practice the waitStatus parameter must be PROC_WAIT_STATUS_OK.
  */
 void
-ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
+ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus, LatchGroup *wakeups)
 {
 	if (dlist_node_is_detached(&proc->links))
 		return;
@@ -1622,8 +1625,8 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
 	proc->waitStatus = waitStatus;
 	pg_atomic_write_u64(&MyProc->waitStart, 0);
 
-	/* And awaken it */
-	SetLatch(&proc->procLatch);
+	/* Schedule it to be awoken */
+	AddLatch(wakeups, &proc->procLatch);
 }
 
 /*
@@ -1634,7 +1637,7 @@ ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus)
  * The appropriate lock partition lock must be held by caller.
  */
 void
-ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
+ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock, LatchGroup *wakeups)
 {
 	dclist_head *waitQueue = &lock->waitProcs;
 	LOCKMASK	aheadRequests = 0;
@@ -1659,7 +1662,7 @@ ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock)
 			/* OK to waken */
 			GrantLock(lock, proc->waitProcLock, lockmode);
 			/* removes proc from the lock's waiting process queue */
-			ProcWakeup(proc, PROC_WAIT_STATUS_OK);
+			ProcWakeup(proc, PROC_WAIT_STATUS_OK, wakeups);
 		}
 		else
 		{
@@ -1685,6 +1688,7 @@ static void
 CheckDeadLock(void)
 {
 	int			i;
+	LatchGroup	wakeups = {0};
 
 	/*
 	 * Acquire exclusive lock on the entire shared lock data structures. Must
@@ -1719,7 +1723,7 @@ CheckDeadLock(void)
 #endif
 
 	/* Run the deadlock check, and set deadlock_state for use by ProcSleep */
-	deadlock_state = DeadLockCheck(MyProc);
+	deadlock_state = DeadLockCheck(MyProc, &wakeups);
 
 	if (deadlock_state == DS_HARD_DEADLOCK)
 	{
@@ -1736,7 +1740,8 @@ CheckDeadLock(void)
 		 * return from the signal handler.
 		 */
 		Assert(MyProc->waitLock != NULL);
-		RemoveFromWaitQueue(MyProc, LockTagHashCode(&(MyProc->waitLock->tag)));
+		RemoveFromWaitQueue(MyProc, LockTagHashCode(&(MyProc->waitLock->tag)),
+							&wakeups);
 
 		/*
 		 * We're done here.  Transaction abort caused by the error that
@@ -1760,6 +1765,8 @@ CheckDeadLock(void)
 check_done:
 	for (i = NUM_LOCK_PARTITIONS; --i >= 0;)
 		LWLockRelease(LockHashPartitionLockByIndex(i));
+
+	SetLatches(&wakeups);
 }
 
 /*
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 6ae434596a..8ee2f3e855 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -20,6 +20,7 @@
 
 #include "lib/ilist.h"
 #include "storage/backendid.h"
+#include "storage/latch.h"
 #include "storage/lockdefs.h"
 #include "storage/lwlock.h"
 #include "storage/shmem.h"
@@ -583,7 +584,7 @@ extern bool LockCheckConflicts(LockMethod lockMethodTable,
 							   LOCK *lock, PROCLOCK *proclock);
 extern void GrantLock(LOCK *lock, PROCLOCK *proclock, LOCKMODE lockmode);
 extern void GrantAwaitedLock(void);
-extern void RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode);
+extern void RemoveFromWaitQueue(PGPROC *proc, uint32 hashcode, LatchGroup *wakeups);
 extern Size LockShmemSize(void);
 extern LockData *GetLockStatusData(void);
 extern BlockedProcsData *GetBlockerStatusData(int blocked_pid);
@@ -600,7 +601,7 @@ extern void lock_twophase_postabort(TransactionId xid, uint16 info,
 extern void lock_twophase_standby_recover(TransactionId xid, uint16 info,
 										  void *recdata, uint32 len);
 
-extern DeadLockState DeadLockCheck(PGPROC *proc);
+extern DeadLockState DeadLockCheck(PGPROC *proc, LatchGroup *wakeups);
 extern PGPROC *GetBlockingAutoVacuumPgproc(void);
 extern void DeadLockReport(void) pg_attribute_noreturn();
 extern void RememberSimpleDeadLock(PGPROC *proc1,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index ca39ae9486..799db535c9 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -451,9 +451,9 @@ extern int	GetStartupBufferPinWaitBufId(void);
 extern bool HaveNFreeProcs(int n, int *nfree);
 extern void ProcReleaseLocks(bool isCommit);
 
-extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable);
-extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus);
-extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock);
+extern ProcWaitStatus ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups);
+extern void ProcWakeup(PGPROC *proc, ProcWaitStatus waitStatus, LatchGroup *wakeups);
+extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock, LatchGroup *wakeups);
 extern void CheckDeadLockAlert(void);
 extern bool IsWaitingForLock(void);
 extern void LockErrorCleanup(void);
-- 
2.39.2

From 77c6f747011c0ff8771414ded1a23306e6e953c9 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 27 Oct 2022 10:56:39 +1300
Subject: [PATCH v3 6/6] Don't re-acquire LockManager partition lock after
 wakeup.

Change the contract of WaitOnLock() and ProcSleep() to return with the
partition lock released.  ProcSleep() re-acquired the lock to prevent
interrupts, which was a problem at the time the ancestor of that code
was committed in fe548629c50, because then we had signal handlers that
longjmp'd out of there.  Now, that can happen only at points where we
explicitly run CHECK_FOR_INTERRUPTS(), so we can remove the lock, which
leads to the idea of making the function always release.

While an earlier commit fixed the problem that the backend woke us up
before it had even released the lock itself, this lock reacquisition
point was still contended when multiple backends woke at the same time.

XXX Right?  Or what am I missing?

Discussion: https://postgr.es/m/CA%2BhUKGKmO7ze0Z6WXKdrLxmvYa%3DzVGGXOO30MMktufofVwEm1A%40mail.gmail.com
---
 src/backend/storage/lmgr/lock.c | 18 ++++++++++++------
 src/backend/storage/lmgr/proc.c | 20 ++++++++++++--------
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index fec6e52145..97be94dbd3 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -787,6 +787,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	LWLock	   *partitionLock;
 	bool		found_conflict;
 	bool		log_lock = false;
+	bool		partitionLocked = false;
 	LatchGroup	wakeups = {0};
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
@@ -995,6 +996,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	partitionLock = LockHashPartitionLock(hashcode);
 
 	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
+	partitionLocked = true;
 
 	/*
 	 * Find or create lock and proclock entries with this tag
@@ -1099,7 +1101,10 @@ LockAcquireExtended(const LOCKTAG *locktag,
 										 locktag->locktag_type,
 										 lockmode);
 
+		Assert(LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
 		WaitOnLock(locallock, owner, &wakeups);
+		Assert(!LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
+		partitionLocked = false;
 
 		TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1,
 										locktag->locktag_field2,
@@ -1124,7 +1129,6 @@ LockAcquireExtended(const LOCKTAG *locktag,
 			PROCLOCK_PRINT("LockAcquire: INCONSISTENT", proclock);
 			LOCK_PRINT("LockAcquire: INCONSISTENT", lock, lockmode);
 			/* Should we retry ? */
-			LWLockRelease(partitionLock);
 			elog(ERROR, "LockAcquire failed");
 		}
 		PROCLOCK_PRINT("LockAcquire: granted", proclock);
@@ -1137,9 +1141,11 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 */
 	FinishStrongLockAcquire();
 
-	LWLockRelease(partitionLock);
-
-	SetLatches(&wakeups);
+	if (partitionLocked)
+	{
+		LWLockRelease(partitionLock);
+		SetLatches(&wakeups);
+	}
 
 	/*
 	 * Emit a WAL record if acquisition of this lock needs to be replayed in a
@@ -1806,7 +1812,8 @@ MarkLockClear(LOCALLOCK *locallock)
  * Caller must have set MyProc->heldLocks to reflect locks already held
  * on the lockable object by this process.
  *
- * The appropriate partition lock must be held at entry.
+ * The appropriate partition lock must be held at entry.  It is not held on
+ * exit.
  */
 static void
 WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
@@ -1851,7 +1858,6 @@ WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, LatchGroup *wakeups)
 			awaitedLock = NULL;
 			LOCK_PRINT("WaitOnLock: aborting on lock",
 					   locallock->lock, locallock->tag.mode);
-			LWLockRelease(LockHashPartitionLock(locallock->hashcode));
 
 			/*
 			 * Now that we aren't holding the partition lock, we can give an
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 156d293210..35720346f2 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -992,7 +992,7 @@ AuxiliaryPidGetProc(int pid)
  * Caller must have set MyProc->heldLocks to reflect locks already held
  * on the lockable object by this process (under all XIDs).
  *
- * The lock table's partition lock must be held at entry, and will be held
+ * The lock table's partition lock must be held at entry, and will be released
  * at exit.
  *
  * Result: PROC_WAIT_STATUS_OK if we acquired the lock, PROC_WAIT_STATUS_ERROR if not (deadlock).
@@ -1020,6 +1020,12 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 	ProcWaitStatus myWaitStatus;
 	PGPROC	   *leader = MyProc->lockGroupLeader;
 
+	/*
+	 * Every way out of this function will release the partition lock and send
+	 * buffered latch wakeups.
+	 */
+	Assert(LWLockHeldByMeInMode(partitionLock, LW_EXCLUSIVE));
+
 	/*
 	 * If group locking is in use, locks held by members of my locking group
 	 * need to be included in myHeldLocks.  This is not required for relation
@@ -1104,6 +1110,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 					/* Skip the wait and just grant myself the lock. */
 					GrantLock(lock, proclock, lockmode);
 					GrantAwaitedLock();
+					LWLockRelease(partitionLock);
+					SetLatches(wakeups);
 					return PROC_WAIT_STATUS_OK;
 				}
 
@@ -1140,6 +1148,8 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 	if (early_deadlock)
 	{
 		RemoveFromWaitQueue(MyProc, hashcode, wakeups);
+		LWLockRelease(partitionLock);
+		SetLatches(wakeups);
 		return PROC_WAIT_STATUS_ERROR;
 	}
 
@@ -1469,6 +1479,7 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 			}
 
 			LWLockRelease(partitionLock);
+			SetLatches(wakeups);
 
 			if (deadlock_state == DS_SOFT_DEADLOCK)
 				ereport(LOG,
@@ -1570,13 +1581,6 @@ ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, LatchGroup *wakeups)
 							standbyWaitStart, GetCurrentTimestamp(),
 							NULL, false);
 
-	/*
-	 * Re-acquire the lock table's partition lock.  We have to do this to hold
-	 * off cancel/die interrupts before we can mess with lockAwaited (else we
-	 * might have a missed or duplicated locallock update).
-	 */
-	LWLockAcquire(partitionLock, LW_EXCLUSIVE);
-
 	/*
 	 * We no longer want LockErrorCleanup to do anything.
 	 */
-- 
2.39.2

Re: Latches vs lwlock contention

Reply via email to