Re: scalability bottlenecks with (many) partitions (and more)

Tomas Vondra Thu, 12 Sep 2024 16:45:23 -0700

Turns out there was a bug in EXEC_BACKEND mode, causing failures on the
Windows machine in CI. AFAIK the reason is pretty simple - the backends
don't see the number of fast-path groups postmaster calculated from
max_locks_per_transaction.


Fixed that by calculating it again in AttachSharedMemoryStructs, which
seems to have done the trick. With this the CI builds pass just fine,
but I'm not sure if EXEC_BACKENDS may have some other issues with the
PGPROC changes. Could it happen that the shared memory gets mapped
differently, in which case the pointers might need to be adjusted?


regards

-- 
Tomas Vondra

From 45a2111c51016386a31d766700b23d9d88ff6c0b Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Thu, 12 Sep 2024 23:09:41 +0200
Subject: [PATCH v20240913 1/2] Increase the number of fast-path lock slots

The fast-path locking introduced in 9.2 allowed each backend to acquire
up to 16 relation locks cheaply, provided the lock level allows that.
If a backend needs to hold more locks, it has to insert them into the
regular lock table in shared memory. This is considerably more
expensive, and on many-core systems may be subject to contention.

The limit of 16 entries was always rather low, even with simple queries
and schemas with only a few tables. We have to lock all relations - not
just tables, but also indexes, views, etc. Moreover, for planning we
need to lock all relations that might be used in the plan, not just
those that actually get used in the final plan. It only takes a couple
tables with multiple indexes to need more than 16 locks. It was quite
common to fill all fast-path slots.

As partitioning gets used more widely, with more and more partitions,
this limit is trivial to hit, with complex queries easily using hundreds
or even thousands of locks. For workloads doing a lot of I/O this is not
noticeable, but on large machines with enough RAM to keep the data in
memory, the access to the shared lock table may be a serious issue.

This patch improves this by increasing the number of fast-path slots
from 16 to 1024. The slots remain in PGPROC, and are organized as an
array of 16-slot groups (each group being effectively a clone of the
original fast-path approach). Instead of accessing this as a big hash
table with open addressing, we treat this as a 16-way set associative
cache. Each relation (identified by a "relid" OID) is mapped to a
particular 16-slot group by calculating a hash

    h(relid) = ((relid * P) mod N)

where P is a hard-coded prime, and N is the number of groups. This is
not a great hash function, but it works well enough - the main purpose
is to prevent "hot groups" with runs of consecutive OIDs, which might
fill some of the fast-path groups. The multiplication by P ensures that.
If the OIDs are already spread out, the hash should not group them.

The groups are processed by linear search. With only 16 entries this is
cheap, and the groups have very good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table is getting almost full. The
usual solution is to grow the table, but for hash tables in shared
memory that's not trivial. It would also have worse locality, due to
more random access.

Luckily, fast-path locking already has a simple solution to deal with a
full hash table. The lock can be simply inserted into the shared lock
table, just like before. Of course, if this happens too often, that
reduces the benefit of fast-path locking.

This patch hard-codes the number of groups to 64, which means 1024
fast-path locks. As all the information is still stored in PGPROC, this
grows PGPROC by about 4.5kB (from ~840B to ~5kB). This is a trade off
exchanging memory for cheaper locking.

Ultimately, the number of fast-path slots should not be hard coded, but
adjustable based on what the workload does, perhaps using a GUC. That
however means it can't be stored in PGPROC directly.
---
 src/backend/storage/lmgr/lock.c | 118 ++++++++++++++++++++++++++------
 src/include/storage/proc.h      |   8 ++-
 2 files changed, 102 insertions(+), 24 deletions(-)

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index 83b99a98f08..d053ae0c409 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -167,7 +167,7 @@ typedef struct TwoPhaseLockRecord
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
  */
-static int	FastPathLocalUseCount = 0;
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,23 +184,53 @@ static int	FastPathLocalUseCount = 0;
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Macros to calculate the group and index for a relation.
+ *
+ * The formula is a simple hash function, designed to spread the OIDs a bit,
+ * so that even contiguous values end up in different groups. In most cases
+ * there will be gaps anyway, but the multiplication should help a bit.
+ *
+ * The selected value (49157) is a prime not too close to 2^k, and it's
+ * small enough to not cause overflows (in 64-bit).
+ */
+#define FAST_PATH_LOCK_REL_GROUP(rel) \
+	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+
+/* Calculate index in the whole per-backend array of lock slots. */
+#define FP_LOCK_SLOT_INDEX(group, index) \
+	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
+	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
+
+/*
+ * Given a lock index (into the per-backend array), calculated using the
+ * FP_LOCK_SLOT_INDEX macro, calculate group and index (within the group).
+ */
+#define FAST_PATH_LOCK_GROUP(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) / FP_LOCK_SLOTS_PER_GROUP))
+#define FAST_PATH_LOCK_INDEX(index)	\
+	(AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_BACKEND)), \
+	 ((index) % FP_LOCK_SLOTS_PER_GROUP))
+
 /* Macros for manipulating proc->fpLockBits */
 #define FAST_PATH_BITS_PER_SLOT			3
 #define FAST_PATH_LOCKNUMBER_OFFSET		1
 #define FAST_PATH_MASK					((1 << FAST_PATH_BITS_PER_SLOT) - 1)
 #define FAST_PATH_GET_BITS(proc, n) \
-	(((proc)->fpLockBits >> (FAST_PATH_BITS_PER_SLOT * n)) & FAST_PATH_MASK)
+	(((proc)->fpLockBits[(n)/16] >> (FAST_PATH_BITS_PER_SLOT * FAST_PATH_LOCK_INDEX(n))) & FAST_PATH_MASK)
 #define FAST_PATH_BIT_POSITION(n, l) \
 	(AssertMacro((l) >= FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((l) < FAST_PATH_BITS_PER_SLOT+FAST_PATH_LOCKNUMBER_OFFSET), \
 	 AssertMacro((n) < FP_LOCK_SLOTS_PER_BACKEND), \
-	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (n)))
+	 ((l) - FAST_PATH_LOCKNUMBER_OFFSET + FAST_PATH_BITS_PER_SLOT * (FAST_PATH_LOCK_INDEX(n))))
 #define FAST_PATH_SET_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] |= UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)
 #define FAST_PATH_CLEAR_LOCKMODE(proc, n, l) \
-	 (proc)->fpLockBits &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
+	 (proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] &= ~(UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l))
 #define FAST_PATH_CHECK_LOCKMODE(proc, n, l) \
-	 ((proc)->fpLockBits & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
+	 ((proc)->fpLockBits[FAST_PATH_LOCK_GROUP(n)] & (UINT64CONST(1) << FAST_PATH_BIT_POSITION(n, l)))
 
 /*
  * The fast-path lock mechanism is concerned only with relation locks on
@@ -926,7 +956,7 @@ LockAcquireExtended(const LOCKTAG *locktag,
 	 * for now we don't worry about that case either.
 	 */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount < FP_LOCK_SLOTS_PER_BACKEND)
+		FastPathLocalUseCounts[FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2)] < FP_LOCK_SLOTS_PER_GROUP)
 	{
 		uint32		fasthashcode = FastPathStrongLockHashPartition(hashcode);
 		bool		acquired;
@@ -1970,6 +2000,7 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	PROCLOCK   *proclock;
 	LWLock	   *partitionLock;
 	bool		wakeupNeeded;
+	int			group;
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2063,9 +2094,12 @@ LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock)
 	 */
 	locallock->lockCleared = false;
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/* Attempt fast release of any lock eligible for the fast path. */
 	if (EligibleForRelationFastPath(locktag, lockmode) &&
-		FastPathLocalUseCount > 0)
+		FastPathLocalUseCounts[group] > 0)
 	{
 		bool		released;
 
@@ -2633,12 +2667,21 @@ LockReassignOwner(LOCALLOCK *locallock, ResourceOwner parent)
 static bool
 FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	uint32		unused_slot = FP_LOCK_SLOTS_PER_BACKEND;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	/* Scan for existing entry for this relid, remembering empty slot. */
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (FAST_PATH_GET_BITS(MyProc, f) == 0)
 			unused_slot = f;
 		else if (MyProc->fpRelId[f] == relid)
@@ -2654,7 +2697,7 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 	{
 		MyProc->fpRelId[unused_slot] = relid;
 		FAST_PATH_SET_LOCKMODE(MyProc, unused_slot, lockmode);
-		++FastPathLocalUseCount;
+		++FastPathLocalUseCounts[group];
 		return true;
 	}
 
@@ -2670,12 +2713,21 @@ FastPathGrantRelationLock(Oid relid, LOCKMODE lockmode)
 static bool
 FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 {
-	uint32		f;
 	bool		result = false;
+	uint32		i,
+				group;
 
-	FastPathLocalUseCount = 0;
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+	FastPathLocalUseCounts[group] = 0;
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
+
 		if (MyProc->fpRelId[f] == relid
 			&& FAST_PATH_CHECK_LOCKMODE(MyProc, f, lockmode))
 		{
@@ -2685,7 +2737,7 @@ FastPathUnGrantRelationLock(Oid relid, LOCKMODE lockmode)
 			/* we continue iterating so as to update FastPathLocalUseCount */
 		}
 		if (FAST_PATH_GET_BITS(MyProc, f) != 0)
-			++FastPathLocalUseCount;
+			++FastPathLocalUseCounts[group];
 	}
 	return result;
 }
@@ -2714,7 +2766,8 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 	for (i = 0; i < ProcGlobal->allProcCount; i++)
 	{
 		PGPROC	   *proc = &ProcGlobal->allProcs[i];
-		uint32		f;
+		uint32		j,
+					group;
 
 		LWLockAcquire(&proc->fpInfoLock, LW_EXCLUSIVE);
 
@@ -2739,9 +2792,16 @@ FastPathTransferRelationLocks(LockMethod lockMethodTable, const LOCKTAG *locktag
 			continue;
 		}
 
-		for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+		/* fast-path group the lock belongs to */
+		group = FAST_PATH_LOCK_REL_GROUP(relid);
+
+		for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 		{
 			uint32		lockmode;
+			uint32		f;
+
+			/* index into the whole per-backend array */
+			f = FP_LOCK_SLOT_INDEX(group, j);
 
 			/* Look for an allocated slot matching the given relid. */
 			if (relid != proc->fpRelId[f] || FAST_PATH_GET_BITS(proc, f) == 0)
@@ -2793,13 +2853,21 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
 	PROCLOCK   *proclock = NULL;
 	LWLock	   *partitionLock = LockHashPartitionLock(locallock->hashcode);
 	Oid			relid = locktag->locktag_field2;
-	uint32		f;
+	uint32		i,
+				group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(relid);
 
 	LWLockAcquire(&MyProc->fpInfoLock, LW_EXCLUSIVE);
 
-	for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+	for (i = 0; i < FP_LOCK_SLOTS_PER_GROUP; i++)
 	{
 		uint32		lockmode;
+		uint32		f;
+
+		/* index into the whole per-backend array */
+		f = FP_LOCK_SLOT_INDEX(group, i);
 
 		/* Look for an allocated slot matching the given relid. */
 		if (relid != MyProc->fpRelId[f] || FAST_PATH_GET_BITS(MyProc, f) == 0)
@@ -2903,6 +2971,10 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	LWLock	   *partitionLock;
 	int			count = 0;
 	int			fast_count = 0;
+	uint32		group;
+
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
 
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
@@ -2957,7 +3029,7 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 		for (i = 0; i < ProcGlobal->allProcCount; i++)
 		{
 			PGPROC	   *proc = &ProcGlobal->allProcs[i];
-			uint32		f;
+			uint32		j;
 
 			/* A backend never blocks itself */
 			if (proc == MyProc)
@@ -2979,9 +3051,13 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 				continue;
 			}
 
-			for (f = 0; f < FP_LOCK_SLOTS_PER_BACKEND; f++)
+			for (j = 0; j < FP_LOCK_SLOTS_PER_GROUP; j++)
 			{
 				uint32		lockmask;
+				uint32		f;
+
+				/* index into the whole per-backend array */
+				f = FP_LOCK_SLOT_INDEX(group, j);
 
 				/* Look for an allocated slot matching the given relid. */
 				if (relid != proc->fpRelId[f])
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index deeb06c9e01..845058da9fa 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,8 +83,9 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_SLOTS_PER_BACKEND 16
-
+#define		FP_LOCK_GROUPS_PER_BACKEND	64
+#define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -292,7 +293,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits;		/* lock modes held for each fast-path slot */
+	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
+														 * each fast-path slot */
 	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
-- 
2.46.0

From 46c2ec00017821a32b0f9fba8e56b2ad46d9d239 Mon Sep 17 00:00:00 2001
From: Tomas Vondra <[email protected]>
Date: Thu, 12 Sep 2024 23:09:50 +0200
Subject: [PATCH v20240913 2/2] Set fast-path slots using
 max_locks_per_transaction

Instead of using a hard-coded value of 64 groups (1024 fast-path slots),
determine the value based on max_locks_per_transaction GUC. This size
is calculated at startup, before allocating shared memory.

The default value of max_locks_per_transaction value is 64, which means
4 groups of fast-path locks.

The purpose of the max_locks_per_transaction GUC is to size the shared
lock table, but it's the best information about the expected number of
locks available. It is often set to an average number of locks needed by
a backend, but some backends may need substantially fewer/more locks.

This means fast-path capacity calculated from max_locks_per_transaction
may not be sufficient for some backends, forcing use of the shared lock
table. The assumption is this is not a major issue - there can't be too
many of such backends, otherwise the max_locks_per_transaction would
need to be higher anyway (resolving the fast-path issue too).

If that happens to be a problem, the only solution is to increase the
GUC, even if the shared lock table had sufficient capacity. That is not
free, because each lock in the shared lock table requires about 500B.
With many backends this may be a substantial amount of memory, but then
again - that should only happen on machines with plenty of memory.

In the future we can consider a separate GUC for the number of fast-path
slots, but let's try without one first.

An alternative solution might be to size the fast-path arrays for a
multiple of max_locks_per_transaction. The cost of adding a fast-path
slot is much lower (only ~5B compared to ~500B per entry), so this would
be cheaper than increasing max_locks_per_transaction. But it's not clear
what multiple of max_locks_per_transaction to use.
---
 src/backend/bootstrap/bootstrap.c   |  2 ++
 src/backend/postmaster/postmaster.c |  5 +++
 src/backend/storage/ipc/ipci.c      |  6 ++++
 src/backend/storage/lmgr/lock.c     | 28 +++++++++++++----
 src/backend/storage/lmgr/proc.c     | 47 +++++++++++++++++++++++++++++
 src/backend/tcop/postgres.c         |  3 ++
 src/backend/utils/init/postinit.c   | 34 +++++++++++++++++++++
 src/include/miscadmin.h             |  1 +
 src/include/storage/proc.h          | 11 ++++---
 9 files changed, 126 insertions(+), 11 deletions(-)

diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 7637581a184..ed59dfce893 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -309,6 +309,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
 
 	InitializeMaxBackends();
 
+	InitializeFastPathLocks();
+
 	CreateSharedMemoryAndSemaphores();
 
 	/*
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 96bc1d1cfed..f4a16595d7f 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -903,6 +903,11 @@ PostmasterMain(int argc, char *argv[])
 	 */
 	InitializeMaxBackends();
 
+	/*
+	 * Also calculate the size of the fast-path lock arrays in PGPROC.
+	 */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 6caeca3a8e6..10fc18f2529 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -178,6 +178,12 @@ AttachSharedMemoryStructs(void)
 	Assert(MyProc != NULL);
 	Assert(IsUnderPostmaster);
 
+	/*
+	 * In EXEC_BACKEND mode, backends don't inherit the number of fast-path
+	 * groups we calculated before setting the shmem up, so recalculate it.
+	 */
+	InitializeFastPathLocks();
+
 	CreateOrAttachShmemStructs();
 
 	/*
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index d053ae0c409..505aa52668e 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -166,8 +166,13 @@ typedef struct TwoPhaseLockRecord
  * might be higher than the real number if another backend has transferred
  * our locks to the primary lock table, but it can never be lower than the
  * real value, since only we can acquire locks on our own behalf.
+ *
+ * XXX Allocate a static array of the maximum size. We could have a pointer
+ * and then allocate just the right size to save a couple kB, but that does
+ * not seem worth the extra complexity of having to initialize it etc. This
+ * way it gets initialized automaticaly.
  */
-static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
+static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND_MAX];
 
 /*
  * Flag to indicate if the relation extension lock is held by this backend.
@@ -184,6 +189,17 @@ static int	FastPathLocalUseCounts[FP_LOCK_GROUPS_PER_BACKEND];
  */
 static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
 
+/*
+ * Number of fast-path locks per backend - size of the arrays in PGPROC.
+ * This is set only once during start, before initializing shared memory,
+ * and remains constant after that.
+ *
+ * We set the limit based on max_locks_per_transaction GUC, because that's
+ * the best information about expected number of locks per backend we have.
+ * See InitializeFastPathLocks for details.
+ */
+int			FastPathLockGroupsPerBackend = 0;
+
 /*
  * Macros to calculate the group and index for a relation.
  *
@@ -195,11 +211,11 @@ static bool IsRelationExtensionLockHeld PG_USED_FOR_ASSERTS_ONLY = false;
  * small enough to not cause overflows (in 64-bit).
  */
 #define FAST_PATH_LOCK_REL_GROUP(rel) \
-	(((uint64) (rel) * 49157) % FP_LOCK_GROUPS_PER_BACKEND)
+	(((uint64) (rel) * 49157) % FastPathLockGroupsPerBackend)
 
 /* Calculate index in the whole per-backend array of lock slots. */
 #define FP_LOCK_SLOT_INDEX(group, index) \
-	(AssertMacro(((group) >= 0) && ((group) < FP_LOCK_GROUPS_PER_BACKEND)), \
+	(AssertMacro(((group) >= 0) && ((group) < FastPathLockGroupsPerBackend)), \
 	 AssertMacro(((index) >= 0) && ((index) < FP_LOCK_SLOTS_PER_GROUP)), \
 	 ((group) * FP_LOCK_SLOTS_PER_GROUP + (index)))
 
@@ -2973,9 +2989,6 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	int			fast_count = 0;
 	uint32		group;
 
-	/* fast-path group the lock belongs to */
-	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
-
 	if (lockmethodid <= 0 || lockmethodid >= lengthof(LockMethods))
 		elog(ERROR, "unrecognized lock method: %d", lockmethodid);
 	lockMethodTable = LockMethods[lockmethodid];
@@ -3005,6 +3018,9 @@ GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *countp)
 	partitionLock = LockHashPartitionLock(hashcode);
 	conflictMask = lockMethodTable->conflictTab[lockmode];
 
+	/* fast-path group the lock belongs to */
+	group = FAST_PATH_LOCK_REL_GROUP(locktag->locktag_field2);
+
 	/*
 	 * Fast path locks might not have been entered in the primary lock table.
 	 * If the lock we're dealing with could conflict with such a lock, we must
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index ac66da8638f..a91b6f8a6c0 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -103,6 +103,8 @@ ProcGlobalShmemSize(void)
 	Size		size = 0;
 	Size		TotalProcs =
 		add_size(MaxBackends, add_size(NUM_AUXILIARY_PROCS, max_prepared_xacts));
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* ProcGlobal */
 	size = add_size(size, sizeof(PROC_HDR));
@@ -113,6 +115,18 @@ ProcGlobalShmemSize(void)
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->subxidStates)));
 	size = add_size(size, mul_size(TotalProcs, sizeof(*ProcGlobal->statusFlags)));
 
+	/*
+	 * fast-path lock arrays
+	 *
+	 * XXX The explicit alignment may not be strictly necessary, as both
+	 * values are already multiples of 8 bytes, which is what MAXALIGN does.
+	 * But better to make that obvious.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	size = add_size(size, mul_size(TotalProcs, (fpLockBitsSize + fpRelIdSize)));
+
 	return size;
 }
 
@@ -162,6 +176,10 @@ InitProcGlobal(void)
 				j;
 	bool		found;
 	uint32		TotalProcs = MaxBackends + NUM_AUXILIARY_PROCS + max_prepared_xacts;
+	char	   *fpPtr,
+			   *fpEndPtr PG_USED_FOR_ASSERTS_ONLY;
+	Size		fpLockBitsSize,
+				fpRelIdSize;
 
 	/* Create the ProcGlobal shared structure */
 	ProcGlobal = (PROC_HDR *)
@@ -211,12 +229,38 @@ InitProcGlobal(void)
 	ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->statusFlags));
 	MemSet(ProcGlobal->statusFlags, 0, TotalProcs * sizeof(*ProcGlobal->statusFlags));
 
+	/*
+	 * Allocate arrays for fast-path locks. Those are variable-length, so
+	 * can't be included in PGPROC. We allocate a separate piece of shared
+	 * memory and then divide that between backends.
+	 */
+	fpLockBitsSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(uint64));
+	fpRelIdSize = MAXALIGN(FastPathLockGroupsPerBackend * sizeof(Oid) * FP_LOCK_SLOTS_PER_GROUP);
+
+	fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
+	MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
+	/* For asserts checking we did not overflow. */
+	fpEndPtr = fpPtr + (TotalProcs * (fpLockBitsSize + fpRelIdSize));
+
 	for (i = 0; i < TotalProcs; i++)
 	{
 		PGPROC	   *proc = &procs[i];
 
 		/* Common initialization for all PGPROCs, regardless of type. */
 
+		/*
+		 * Set the fast-path lock arrays, and move the pointer. We interleave
+		 * the two arrays, to keep at least some locality.
+		 */
+		proc->fpLockBits = (uint64 *) fpPtr;
+		fpPtr += fpLockBitsSize;
+
+		proc->fpRelId = (Oid *) fpPtr;
+		fpPtr += fpRelIdSize;
+
+		Assert(fpPtr <= fpEndPtr);
+
 		/*
 		 * Set up per-PGPROC semaphore, latch, and fpInfoLock.  Prepared xact
 		 * dummy PGPROCs don't need these though - they're never associated
@@ -278,6 +322,9 @@ InitProcGlobal(void)
 		pg_atomic_init_u64(&(proc->waitStart), 0);
 	}
 
+	/* We expect to consume exactly the expected amount of data. */
+	Assert(fpPtr = fpEndPtr);
+
 	/*
 	 * Save pointers to the blocks of PGPROC structures reserved for auxiliary
 	 * processes and prepared transactions.
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8bc6bea1135..f54ae00abca 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -4166,6 +4166,9 @@ PostgresSingleUserMain(int argc, char *argv[],
 	/* Initialize MaxBackends */
 	InitializeMaxBackends();
 
+	/* Initialize size of fast-path lock cache. */
+	InitializeFastPathLocks();
+
 	/*
 	 * Give preloaded libraries a chance to request additional shared memory.
 	 */
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 3b50ce19a2c..1faf756c8d8 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -557,6 +557,40 @@ InitializeMaxBackends(void)
 						   MAX_BACKENDS)));
 }
 
+/*
+ * Initialize the number of fast-path lock slots in PGPROC.
+ *
+ * This must be called after modules have had the chance to alter GUCs in
+ * shared_preload_libraries and before shared memory size is determined.
+ *
+ * The default max_locks_per_xact=64 means 4 groups by default.
+ *
+ * We allow anything between 1 and 1024 groups, with the usual power-of-2
+ * logic. The 1 is the "old" value before allowing multiple groups, 1024
+ * is an arbitrary limit (matching max_locks_per_xact = 16k). Values over
+ * 1024 are unlikely to be beneficial - we're likely to hit other
+ * bottlenecks long before that.
+ */
+void
+InitializeFastPathLocks(void)
+{
+	Assert(FastPathLockGroupsPerBackend == 0);
+
+	/* we need at least one group */
+	FastPathLockGroupsPerBackend = 1;
+
+	while (FastPathLockGroupsPerBackend < FP_LOCK_GROUPS_PER_BACKEND_MAX)
+	{
+		/* stop once we exceed max_locks_per_xact */
+		if (FastPathLockGroupsPerBackend * FP_LOCK_SLOTS_PER_GROUP >= max_locks_per_xact)
+			break;
+
+		FastPathLockGroupsPerBackend *= 2;
+	}
+
+	Assert(FastPathLockGroupsPerBackend <= FP_LOCK_GROUPS_PER_BACKEND_MAX);
+}
+
 /*
  * Early initialization of a backend (either standalone or under postmaster).
  * This happens even before InitPostgres.
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 25348e71eb9..e26d108a470 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -475,6 +475,7 @@ extern PGDLLIMPORT ProcessingMode Mode;
 #define INIT_PG_OVERRIDE_ROLE_LOGIN		0x0004
 extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
 extern void InitializeMaxBackends(void);
+extern void InitializeFastPathLocks(void);
 extern void InitPostgres(const char *in_dbname, Oid dboid,
 						 const char *username, Oid useroid,
 						 bits32 flags,
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 845058da9fa..0e55c166529 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -83,9 +83,11 @@ struct XidCache
  * rather than the main lock table.  This eases contention on the lock
  * manager LWLocks.  See storage/lmgr/README for additional details.
  */
-#define		FP_LOCK_GROUPS_PER_BACKEND	64
+extern PGDLLIMPORT int FastPathLockGroupsPerBackend;
+#define		FP_LOCK_GROUPS_PER_BACKEND_MAX	1024
 #define		FP_LOCK_SLOTS_PER_GROUP		16	/* don't change */
-#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FP_LOCK_GROUPS_PER_BACKEND)
+#define		FP_LOCK_SLOTS_PER_BACKEND	(FP_LOCK_SLOTS_PER_GROUP * FastPathLockGroupsPerBackend)
+
 /*
  * Flags for PGPROC.delayChkptFlags
  *
@@ -293,9 +295,8 @@ struct PGPROC
 
 	/* Lock manager data, recording fast-path locks taken by this backend. */
 	LWLock		fpInfoLock;		/* protects per-backend fast-path state */
-	uint64		fpLockBits[FP_LOCK_GROUPS_PER_BACKEND]; /* lock modes held for
-														 * each fast-path slot */
-	Oid			fpRelId[FP_LOCK_SLOTS_PER_BACKEND]; /* slots for rel oids */
+	uint64	   *fpLockBits;		/* lock modes held for each fast-path slot */
+	Oid		   *fpRelId;		/* slots for rel oids */
 	bool		fpVXIDLock;		/* are we holding a fast-path VXID lock? */
 	LocalTransactionId fpLocalTransactionId;	/* lxid for fast-path VXID
 												 * lock */
-- 
2.46.0

Re: scalability bottlenecks with (many) partitions (and more)

Reply via email to