Hello, everyone.

I have tried to put it all together.

> In the absence of that approach, falling back to a counter that
> compresses every N xids would be best, in addition to the two new
> forced compression events.

Done.

> Also, if we add more forced compressions, it seems like we should have
> a short-circuit for a forced compression where there's nothing to do.

Done.

> I'm also wondering why there's not an
>
>     Assert(compress_index == pArray->numKnownAssignedXids);
>
> after the loop, to make sure our numKnownAssignedXids tracking
> is sane.

Done.

> * when idle - since we have time to do it when that happens, which
> happens often since most workloads are bursty

I have added getting of ProcArrayLock for this case.
Also, I have added maximum frequency as 1 per second to avoid
contention with heavy read load in case of small,
episodic but regular WAL traffic (WakeupRecovery() for each 100ms for
example). Or it is useless?

> It'd be more reliable
> to use a static counter to skip all but every N'th compress attempt
> (something we could do inside KnownAssignedXidsCompress itself, instead
> of adding warts at the call sites).

Done. I have added “reason” enum for calling KnownAssignedXidsCompress
to keep it as much clean as possible.
But not sure that I was successful here.

Also, I think while we still in the context, it is good to add:
* Simon's optimization (1) for KnownAssignedXidsRemoveTree (it is
simple and effective for some situations without any seen drawbacks)
* Maybe my patch (2) for replacing known_assigned_xids_lck with memory barrier?

New version in attach. Running benchmarks now.
Preliminary result in attachments (16CPU, 5000 max_connections, now 64
active connections instead of 16).
Also, interesting moment - with 64 connections, vanilla version is
unable to recover its performance after 30-sec transaction on primary.

[1]: 
https://www.postgresql.org/message-id/flat/CANbhV-Ey8HRYPvnvQnsZAteCfzN3VHVhZVKfWMYcnjMnSzs4dQ%40mail.gmail.com#05993cf2bc87e35e0dff38fec26b9805
[2]: 
https://www.postgresql.org/message-id/flat/CANtu0oiPoSdQsjRd6Red5WMHi1E83d2%2B-bM9J6dtWR3c5Tap9g%40mail.gmail.com#cc4827dee902978f93278732435e8521

--
Michail Nikolaev
From 926403598e9506edfa32c9534be9a4e48f756002 Mon Sep 17 00:00:00 2001
From: Michail Nikolaev <michail.nikol...@gmail.com>
Date: Wed, 23 Nov 2022 00:13:32 +0300
Subject: [PATCH vnext] Currently, KnownAssignedXidsGetAndSetXmin requires an
 iterative loop through KnownAssignedXids array, including xids marked as
 invalid. Performance impact is especially noticeable in the presence of long
 (few seconds) transactions on primary, high value (few thousands) of
 max_connections and high read workload on standby. Most of the CPU spent on
 looping throw KnownAssignedXids while almost all xid are invalid anyway.
 KnownAssignedXidsCompress removes invalid xid from time to time, but
 performance is still affected.

To increase performance, frequency of running KnownAssignedXidsCompress is increased. Now it is called for each xid % 64 == 0 (number selected by running benchmarks). Also, the minimum bound of element to compress (4 * PROCARRAY_MAXPROCS) is removed.

Additionally, compression is called on RecoveryInfo and idle state of startup process.

Simon Riggs, Michail Nikolaev with help by Tom Lane and
Andres Freund.
---
 src/backend/access/transam/xlogrecovery.c |   3 +
 src/backend/storage/ipc/procarray.c       | 103 +++++++++++++++++-----
 src/include/storage/standby.h             |   1 +
 3 files changed, 85 insertions(+), 22 deletions(-)

diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index cb07694aea..2cc9581cb6 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -3836,6 +3836,9 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 						streaming_reply_sent = true;
 					}
 
+					/* Do any background tasks that might benefit us later */
+					StartupProcessIdleMaintenance();
+
 					/* Update pg_stat_recovery_prefetch before sleeping. */
 					XLogPrefetcherComputeStats(xlogprefetcher);
 
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 283517d956..fde748519e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -274,6 +274,10 @@ static TransactionId *KnownAssignedXids;
 static bool *KnownAssignedXidsValid;
 static TransactionId latestObservedXid = InvalidTransactionId;
 
+/* Counters for KnownAssignedXids compression heuristic */
+static int transactionEndsCounter;
+static TimestampTz lastCompressTs;
+
 /*
  * If we're in STANDBY_SNAPSHOT_PENDING state, standbySnapshotPendingXmin is
  * the highest xid that might still be running that we don't have in
@@ -335,8 +339,16 @@ static void DisplayXidCache(void);
 #define xc_slow_answer_inc()		((void) 0)
 #endif							/* XIDCACHE_DEBUG */
 
+typedef enum KnownAssignedXidsCompressReason
+{
+	NO_SPACE,
+	RECOVERY_INFO,
+	TRANSACTION_END,
+	STARTUP_PROCESS_IDLE,
+} KnownAssignedXidsCompressReason;
+
 /* Primitives for KnownAssignedXids array handling for standby */
-static void KnownAssignedXidsCompress(bool force);
+static void KnownAssignedXidsCompress(KnownAssignedXidsCompressReason reason);
 static void KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 								 bool exclusive_lock);
 static bool KnownAssignedXidsSearch(TransactionId xid, bool remove);
@@ -451,6 +463,8 @@ CreateSharedProcArray(void)
 			ShmemInitStruct("KnownAssignedXidsValid",
 							mul_size(sizeof(bool), TOTAL_MAX_CACHED_SUBXIDS),
 							&found);
+		transactionEndsCounter = 0;
+		lastCompressTs = GetCurrentTimestamp();
 	}
 }
 
@@ -4591,48 +4605,92 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
  * currently valid XIDs in the array.
  */
 
+void
+StartupProcessIdleMaintenance(void)
+{
+	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+	KnownAssignedXidsCompress(STARTUP_PROCESS_IDLE);
+	LWLockRelease(ProcArrayLock);
+}
+
+#define KAX_COMPRESS_FREQUENCY 64
+#define KAX_COMPRESS_IDLE_INTERVAL 1000
 
 /*
  * Compress KnownAssignedXids by shifting valid data down to the start of the
  * array, removing any gaps.
  *
- * A compression step is forced if "force" is true, otherwise we do it
+ * A compression step is forced if "reason" is NO_SPACE, otherwise we do it
  * only if a heuristic indicates it's a good time to do it.
  *
  * Caller must hold ProcArrayLock in exclusive mode.
  */
 static void
-KnownAssignedXidsCompress(bool force)
-{
+KnownAssignedXidsCompress(KnownAssignedXidsCompressReason reason) {
 	ProcArrayStruct *pArray = procArray;
 	int			head,
 				tail;
 	int			compress_index;
-	int			i;
+	int			i,
+				nelements;
 
 	/* no spinlock required since we hold ProcArrayLock exclusively */
 	head = pArray->headKnownAssignedXids;
 	tail = pArray->tailKnownAssignedXids;
 
-	if (!force)
+	nelements = head - tail;
+	/*
+	 * If we can choose how much to compress, use a heuristic to avoid
+	 * compressing too often or not often enough. "Compress" here means
+	 * simply moving the values to the beginning of the array, so is
+	 * not as complex or costly as typical data compression algorithms.
+	 */
+
+	/* Skip the compression for the case without invalid elements. */
+	if (nelements == pArray->numKnownAssignedXids)
 	{
 		/*
-		 * If we can choose how much to compress, use a heuristic to avoid
-		 * compressing too often or not often enough.
-		 *
-		 * Heuristic is if we have a large enough current spread and less than
-		 * 50% of the elements are currently in use, then compress. This
-		 * should ensure we compress fairly infrequently. We could compress
-		 * less often though the virtual array would spread out more and
-		 * snapshots would become more expensive.
+		 * We may compress an array even without any gaps because it required
+		 * to move the xids to the start of the array to free some space.
 		 */
-		int			nelements = head - tail;
-
-		if (nelements < 4 * PROCARRAY_MAXPROCS ||
-			nelements < 2 * pArray->numKnownAssignedXids)
+		if (reason != NO_SPACE)
 			return;
 	}
 
+	if (reason == TRANSACTION_END) {
+		/*
+		 * Avoid processing compression each commit to accumulate some
+		 * enough work to do. Frequency determined by benchmarks.
+		*/
+		if ((transactionEndsCounter++) % KAX_COMPRESS_FREQUENCY != 0)
+			return;
+		/*
+		 * Heuristic is if less than 50% of the elements are currently in
+		 * use, then compress. This ensures time to take a snapshot is
+		 * bounded at S=2N, using the same notation from earlier comments,
+		 * which is essential to avoid limiting scalability with high N.
+		 * Previously, we prevented compression until S=4M, ensuring that
+		 * snapshot speed would be slow and scale poorly with many CPUs.
+		 *
+		 * As noted earlier, compression is O(S), so now O(2N), while frequency
+		 * of compression is now O(1/N) so that as N varies, the algorithm
+		 * balances nicely the frequency and cost of compression.
+		 */
+		if (nelements < 2 * pArray->numKnownAssignedXids)
+			return;
+
+	} else if (reason == STARTUP_PROCESS_IDLE) {
+		/**
+		 * Compress XID array while no WAL available. But not too often to
+		 * avoid ProcArray lock contention with readers.
+		 */
+		TimestampTz compress_after = TimestampTzPlusMilliseconds(lastCompressTs,
+													KAX_COMPRESS_IDLE_INTERVAL);
+		if (GetCurrentTimestamp() < compress_after)
+			return;
+
+	} else Assert(reason == NO_SPACE || reason == RECOVERY_INFO);
+
 	/*
 	 * We compress the array by reading the valid values from tail to head,
 	 * re-aligning data to 0th element.
@@ -4647,9 +4705,11 @@ KnownAssignedXidsCompress(bool force)
 			compress_index++;
 		}
 	}
+	Assert(compress_index == pArray->numKnownAssignedXids);
 
 	pArray->tailKnownAssignedXids = 0;
 	pArray->headKnownAssignedXids = compress_index;
+	lastCompressTs = GetCurrentTimestamp();
 }
 
 /*
@@ -4725,7 +4785,7 @@ KnownAssignedXidsAdd(TransactionId from_xid, TransactionId to_xid,
 		if (!exclusive_lock)
 			LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
-		KnownAssignedXidsCompress(true);
+		KnownAssignedXidsCompress(NO_SPACE);
 
 		head = pArray->headKnownAssignedXids;
 		/* note: we no longer care about the tail pointer */
@@ -4926,7 +4986,7 @@ KnownAssignedXidsRemoveTree(TransactionId xid, int nsubxids,
 		KnownAssignedXidsRemove(subxids[i]);
 
 	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
+	KnownAssignedXidsCompress(TRANSACTION_END);
 }
 
 /*
@@ -5000,8 +5060,7 @@ KnownAssignedXidsRemovePreceding(TransactionId removeXid)
 		pArray->tailKnownAssignedXids = i;
 	}
 
-	/* Opportunistically compress the array */
-	KnownAssignedXidsCompress(false);
+	KnownAssignedXidsCompress(RECOVERY_INFO);
 }
 
 /*
diff --git a/src/include/storage/standby.h b/src/include/storage/standby.h
index e46c934c56..32ac569519 100644
--- a/src/include/storage/standby.h
+++ b/src/include/storage/standby.h
@@ -45,6 +45,7 @@ extern void StandbyLockTimeoutHandler(void);
 extern void LogRecoveryConflict(ProcSignalReason reason, TimestampTz wait_start,
 								TimestampTz now, VirtualTransactionId *wait_list,
 								bool still_waiting);
+extern void StartupProcessIdleMaintenance(void);
 
 /*
  * Standby Rmgr (RM_STANDBY_ID)
-- 
2.34.1

Reply via email to