Hi,

With a sorted commited.xip array, we could replace the iteration with
two binary searches to find the interval to keep.

Proposed Optimization
---------------------

Use binary search to locate the boundaries of XIDs to remove, then
compact with a single memmove. The key insight requires understanding
how XID precedence relates to numeric ordering.

XID Precedence Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~

PostgreSQL defines XID precedence as:

/* compare two XIDs already known to be normal; this is a macro for speed */
#define NormalTransactionIdPrecedes(id1, id2) \
(AssertMacro(TransactionIdIsNormal(id1) && TransactionIdIsNormal(id2)), \
(int32) ((id1) - (id2)) < 0)

This means: id1 precedes id2 if (int32)(id1 - id2) < 0.

Equivalently, this identifies all XIDs in the modular interval
[id2 - 2^31, id2) on the 32-bit ring as "preceding id2". So XIDs
preceding xmin are exactly those in [xmin - 2^31, xmin).

>From Modular Interval to Array Positions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The arrays are sorted in numeric uint32 order (xip[i] <= xip[i+1] in
unsigned sense), which is a total order—not wraparound-aware. Therefore,
the modular interval we want to remove may appear as one or two numeric
blocks in the sorted array.

Let boundary = xmin - 2^31 (mod 2^32). The modular interval [boundary, xmin)
contains all XIDs to remove (half-open: xmin itself is kept, matching
NormalTransactionIdPrecedes). Where does it appear in the numerically sorted
array?

Case A: (uint32)boundary <= (uint32)xmin (numeric no wrap)
  Example: xmin = 3,000,000,000
           boundary = 3,000,000,000 - 2,147,483,648 = 852,516,352

  Here, (uint32)boundary < (uint32)xmin, so the interval does not cross
  0 numerically. In the sorted array, XIDs to remove form one contiguous
  block: [idx_boundary, idx_xmin).

  Array layout:
    [... keep ...][=== remove ===][... keep ...]
    0 ............ idx_boundary ... idx_xmin ...... n

  Action: Keep prefix [0, idx_boundary) and suffix [idx_xmin, n).

Case B: (uint32)boundary > (uint32)xmin (numeric wrap)
  Example: xmin = 100
           boundary = 100 - 2^31 (mod 2^32) = 2,147,483,748

  Since (uint32)boundary > (uint32)xmin, the interval wraps through 0
  numerically. In the sorted array, XIDs to remove form two blocks:
  [0, idx_xmin) and [idx_boundary, n).

  Array layout:
    [= remove =][... keep ...][= remove =]
    0 ......... idx_xmin .... idx_boundary ......... n

  Action: Keep only the middle [idx_xmin, idx_boundary).

Note: Case B often occurs when xmin is "small" (e.g., right after
startup), making xmin - 2^31 wrap numerically. This is purely about
positions in the numeric order; it does not imply the cluster has
"wrapped" XIDs operationally.

In both cases, we locate idx_boundary and idx_xmin using binary search
in O(log n) time, then use one memmove to compact

The algorithm:
1. Compute boundary = xmin - 2^31
2. Binary search for idx_boundary (first index with xip[i] >= boundary)
3. Binary search for idx_xmin (first index with xip[i] >= xmin)
4. Use memmove to compact based on case A or B above

Benefits
--------

1. Performance: O(log n) binary search vs O(n) linear scan
2. Memory: No workspace allocation needed
3. Simplicity: One memmove instead of allocate + two copies + free

The same logic is applied to both committed.xip and catchange.xip arrays.

Faster binary search
--------

While faster binary search variants exist, the current code already
introduces more complexity than the original. It’s uncertain that
further optimization would deliver a meaningful performance gain.

Best,
Xuneng
From 14e2ee95a153726f88649f545c3606a85e848be3 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Thu, 30 Oct 2025 17:57:45 +0800
Subject: [PATCH v1] Optimize SnapBuild by maintaining committed.xip in sorted order
We now maintain committed.xip in xidComparator order using a per-commit batch insertion approach:

Collect all relevant XIDs for each commit
Sort the batch once: O(m log m)
Reverse-merge it into the global array: O(N + m)

With this change, snapshot builds can skip the qsort() step and simply memcpy the sorted array.
While per-commit work increases from O(m) (plain append) to O(m log m + N) (sort-and-merge), 
eliminating repeated O(N log N) sorts at each snapshot build significantly reduces overall steady-state 
cost once CONSISTENT is reached and snapshots are built frequently.
---
 src/backend/replication/logical/snapbuild.c  | 97 +++++++++++++++++---
 src/include/replication/snapbuild_internal.h | 12 +--
 2 files changed, 85 insertions(+), 24 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 98ddee20929..500805f43dc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -407,8 +407,10 @@ SnapBuildBuildSnapshot(SnapBuild *builder)
 		   builder->committed.xip,
 		   builder->committed.xcnt * sizeof(TransactionId));
 
-	/* sort so we can bsearch() */
-	qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator);
+#ifdef USE_ASSERT_CHECKING
+	for (int i = 1; i < snapshot->xcnt; i++)
+		Assert(snapshot->xip[i - 1] <= snapshot->xip[i]);
+#endif
 
 	/*
 	 * Initially, subxip is empty, i.e. it's a snapshot to be used by
@@ -826,14 +828,29 @@ SnapBuildDistributeSnapshotAndInval(SnapBuild *builder, XLogRecPtr lsn, Transact
  * Keep track of a new catalog changing transaction that has committed.
  */
 static void
-SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
+SnapBuildAddCommittedTxns(SnapBuild *builder,
+						  const TransactionId *batch_xids,
+						  int batch_cnt)
 {
-	Assert(TransactionIdIsValid(xid));
+	TransactionId *committed_xids;
+	int			old_xcnt;
+	int			old_idx;		/* read position in old array */
+	int			batch_idx;		/* read position in batch */
+	int			write_idx;		/* write position (from end) */
 
-	if (builder->committed.xcnt == builder->committed.xcnt_space)
+	old_xcnt = builder->committed.xcnt;
+
+	/* Ensure we have space for all elements */
+	if (old_xcnt + batch_cnt > builder->committed.xcnt_space)
 	{
 		builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
 
+		/*
+		 * Repeat if we need more than 2x current space.
+		 */
+		while (old_xcnt + batch_cnt > builder->committed.xcnt_space)
+			builder->committed.xcnt_space = builder->committed.xcnt_space * 2 + 1;
+
 		elog(DEBUG1, "increasing space for committed transactions to %u",
 			 (uint32) builder->committed.xcnt_space);
 
@@ -841,12 +858,35 @@ SnapBuildAddCommittedTxn(SnapBuild *builder, TransactionId xid)
 										  builder->committed.xcnt_space * sizeof(TransactionId));
 	}
 
+	committed_xids = builder->committed.xip;
+
 	/*
-	 * TODO: It might make sense to keep the array sorted here instead of
-	 * doing it every time we build a new snapshot. On the other hand this
-	 * gets called repeatedly when a transaction with subtransactions commits.
+	 * Merge from the end to avoid overwriting unread data.  We write the
+	 * merged result starting from position (old_xcnt + batch_cnt - 1) and
+	 * work backwards.
 	 */
-	builder->committed.xip[builder->committed.xcnt++] = xid;
+	old_idx = old_xcnt - 1;
+	batch_idx = batch_cnt - 1;
+	write_idx = old_xcnt + batch_cnt - 1;
+
+	while (old_idx >= 0 && batch_idx >= 0)
+	{
+		if (committed_xids[old_idx] > batch_xids[batch_idx])
+			committed_xids[write_idx--] = committed_xids[old_idx--];
+		else
+		{
+			Assert(TransactionIdIsValid(batch_xids[batch_idx]));
+			committed_xids[write_idx--] = batch_xids[batch_idx--];
+		}
+	}
+
+	/* Copy any remaining batch elements */
+	while (batch_idx >= 0)
+		committed_xids[write_idx--] = batch_xids[batch_idx--];
+
+	/* Old elements that weren't moved are already in the correct position */
+
+	builder->committed.xcnt = old_xcnt + batch_cnt;
 }
 
 /*
@@ -948,6 +988,13 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 
 	TransactionId xmax = xid;
 
+	/*
+	 * Collect XIDs that need tracking into a batch.  We'll sort and merge
+	 * them into committed.xip in one pass at the end.
+	 */
+	 TransactionId *batch_xids = NULL;
+	 int		batch_cnt = 0;
+
 	/*
 	 * Transactions preceding BUILDING_SNAPSHOT will neither be decoded, nor
 	 * will they be part of a snapshot.  So we don't need to record anything.
@@ -978,6 +1025,12 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		}
 	}
 
+	if (nsubxacts > 0 || builder->building_full_snapshot ||
+		SnapBuildXidHasCatalogChanges(builder, xid, xinfo))
+	{
+		batch_xids = palloc((nsubxacts + 1) * sizeof(TransactionId));
+	}
+
 	for (nxact = 0; nxact < nsubxacts; nxact++)
 	{
 		TransactionId subxid = subxacts[nxact];
@@ -994,7 +1047,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			elog(DEBUG1, "found subtransaction %u:%u with catalog changes",
 				 xid, subxid);
 
-			SnapBuildAddCommittedTxn(builder, subxid);
+			batch_xids[batch_cnt++] = subxid;
 
 			if (NormalTransactionIdFollows(subxid, xmax))
 				xmax = subxid;
@@ -1008,7 +1061,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		 */
 		else if (needs_timetravel)
 		{
-			SnapBuildAddCommittedTxn(builder, subxid);
+			batch_xids[batch_cnt++] = subxid;
 			if (NormalTransactionIdFollows(subxid, xmax))
 				xmax = subxid;
 		}
@@ -1021,7 +1074,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			 xid);
 		needs_snapshot = true;
 		needs_timetravel = true;
-		SnapBuildAddCommittedTxn(builder, xid);
+		batch_xids[batch_cnt++] = xid;
 	}
 	else if (sub_needs_timetravel)
 	{
@@ -1029,13 +1082,29 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		elog(DEBUG2, "forced transaction %u to do timetravel due to one of its subtransactions",
 			 xid);
 		needs_timetravel = true;
-		SnapBuildAddCommittedTxn(builder, xid);
+		batch_xids[batch_cnt++] = xid;
 	}
 	else if (needs_timetravel)
 	{
 		elog(DEBUG2, "forced transaction %u to do timetravel", xid);
 
-		SnapBuildAddCommittedTxn(builder, xid);
+		batch_xids[batch_cnt++] = xid;
+	}
+
+	/*
+	 * Sort and merge the batch into committed.xip.  This maintains the
+	 * invariant that committed.xip is globally sorted by raw uint32 order
+	 * (xidComparator).
+	 */
+	if (batch_cnt > 0)
+	{
+		/* Sort by raw uint32 order */
+		qsort(batch_xids, batch_cnt, sizeof(TransactionId), xidComparator);
+
+		/* Merge into global array */
+		SnapBuildAddCommittedTxns(builder, batch_xids, batch_cnt);
+
+		pfree(batch_xids);
 	}
 
 	if (!needs_timetravel)
diff --git a/src/include/replication/snapbuild_internal.h b/src/include/replication/snapbuild_internal.h
index 3b915dc8793..164160e5e5e 100644
--- a/src/include/replication/snapbuild_internal.h
+++ b/src/include/replication/snapbuild_internal.h
@@ -115,16 +115,8 @@ struct SnapBuild
 		/*
 		 * Array of committed transactions that have modified the catalog.
 		 *
-		 * As this array is frequently modified we do *not* keep it in
-		 * xidComparator order. Instead we sort the array when building &
-		 * distributing a snapshot.
-		 *
-		 * TODO: It's unclear whether that reasoning has much merit. Every
-		 * time we add something here after becoming consistent will also
-		 * require distributing a snapshot. Storing them sorted would
-		 * potentially also make it easier to purge (but more complicated wrt
-		 * wraparound?). Should be improved if sorting while building the
-		 * snapshot shows up in profiles.
+		 * Maintained in sorted order (by raw uint32 value) to allow efficient
+		 * snapshot building without repeated sorting overhead.
 		 */
 		TransactionId *xip;
 	}			committed;
-- 
2.51.0

From 40e51baf4b73f2606a7c5f50088e8c9911107044 Mon Sep 17 00:00:00 2001
From: alterego655 <[email protected]>
Date: Mon, 10 Nov 2025 10:31:38 +0800
Subject: [PATCH v1 2/2] Optimize SnapBuildPurgeOlderTxn with binary search

SnapBuildPurgeOlderTxn() removes XIDs that precede xmin from the
committed.xip and catchange.xip arrays. Previously, this used an
O(n) linear scan with workspace allocation. Since these arrays are
maintained in sorted order, we can use binary search for O(log n)
performance.

The optimization exploits the fact that XIDs "preceding xmin" form
the modular interval [xmin - 2^31, xmin) on the 32-bit XID ring.
In a numeric-sorted array, this interval appears as either one or
two contiguous segments depending on numeric positions:

- Case A (boundary <= xmin numerically): One block [idx_boundary, idx_xmin)
- Case B (boundary > xmin numerically): Two blocks [0, idx_xmin) and
  [idx_boundary, n)

We use two binary searches to locate these boundaries, then compact
the array with a single memmove. This eliminates workspace allocation
and reduces the operation from O(n) to O(log n).

The same wraparound-aware two-case logic is applied to both committed.xip
and catchange.xip arrays.
---
 src/backend/replication/logical/snapbuild.c | 185 +++++++++++++++-----
 1 file changed, 137 insertions(+), 48 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 344eb4c2ade..5c3bb8f377f 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -152,6 +152,11 @@ static ResourceOwner SavedResourceOwnerDuringExport = NULL;
 static bool ExportInProgress = false;
 
 /* ->committed and ->catchange manipulation */
+static int xid_bsearch_lower_bound(const TransactionId *xip, int n, TransactionId key);
+
+static int xid_purge_with_boundary(TransactionId *xip, int xcnt,
+									TransactionId boundary, TransactionId xmin);
+
 static void SnapBuildPurgeOlderTxn(SnapBuild *builder);
 
 /* snapshot building/manipulation/distribution functions */
@@ -889,6 +894,106 @@ SnapBuildAddCommittedTxns(SnapBuild *builder,
 	builder->committed.xcnt = old_xcnt + batch_cnt;
 }
 
+/*
+ * Binary search helper: find the first index where xip[i] >= key.
+ * Returns n if all elements are less than key.
+ *
+ * This is a standard lower_bound operation using unsigned comparison to match
+ * xidComparator ordering.
+ */
+static int
+xid_bsearch_lower_bound(const TransactionId *xip, int n, TransactionId key)
+{
+	int			lo = 0;
+	int			hi = n;
+
+	while (lo < hi)
+	{
+		int			mid = (lo + hi) >> 1;
+
+		if (xip[mid] < key)
+			lo = mid + 1;
+		else
+			hi = mid;
+	}
+	return lo;
+}
+
+/*
+ * Remove XIDs in the modular interval [boundary, xmin) from a sorted array.
+ *
+ * The array must be sorted in numeric uint32 order. XIDs in [boundary, xmin)
+ * are those that satisfy NormalTransactionIdPrecedes(xid, xmin).
+ *
+ * Returns the new count after removal. The array is compacted in-place.
+ */
+static int
+xid_purge_with_boundary(TransactionId *xip, int xcnt,
+						TransactionId boundary, TransactionId xmin)
+{
+	int			idx_boundary;
+	int			idx_xmin;
+	int			new_xcnt;
+
+	if (xcnt == 0)
+		return 0;
+
+	/*
+	 * Find where [boundary, xmin) appears in the numeric-sorted array.
+	 * idx_boundary = first index with xip[i] >= boundary
+	 * idx_xmin     = first index with xip[i] >= xmin
+	 */
+	idx_boundary = xid_bsearch_lower_bound(xip, xcnt, boundary);
+	idx_xmin = xid_bsearch_lower_bound(xip, xcnt, xmin);
+
+	/*
+	 * Case split based on numeric comparison (array is uint32-sorted):
+	 *
+	 * Case A: boundary <= xmin numerically
+	 *   Interval forms one block: [idx_boundary, idx_xmin)
+	 *   Keep: [0, idx_boundary) and [idx_xmin, n)
+	 *
+	 * Case B: boundary > xmin numerically
+	 *   Interval straddles numeric zero as two blocks:
+	 *     [0, idx_xmin) and [idx_boundary, n)
+	 *   Keep: [idx_xmin, idx_boundary)
+	 *
+	 * Note: Case B occurs due to ring geometry when the modular interval
+	 * crosses zero in numeric order, not because XIDs have "wrapped"
+	 * operationally.
+	 */
+	if (boundary <= xmin)
+	{
+		/* Case A: interval is contiguous, keep prefix and suffix */
+		int			prefix_len = idx_boundary;
+		int			suffix_len = xcnt - idx_xmin;
+
+		if (suffix_len > 0 && idx_xmin > idx_boundary)
+			memmove(xip + prefix_len,
+					xip + idx_xmin,
+					suffix_len * sizeof(TransactionId));
+
+		new_xcnt = prefix_len + suffix_len;
+	}
+	else
+	{
+		/* Case B: interval straddles zero, keep middle block */
+		int			keep_len;
+
+		Assert(idx_boundary >= idx_xmin);
+		keep_len = idx_boundary - idx_xmin;
+
+		if (keep_len > 0 && idx_xmin > 0)
+			memmove(xip,
+					xip + idx_xmin,
+					keep_len * sizeof(TransactionId));
+
+		new_xcnt = keep_len;
+	}
+
+	return new_xcnt;
+}
+
 /*
  * Remove knowledge about transactions we treat as committed or containing catalog
  * changes that are smaller than ->xmin. Those won't ever get checked via
@@ -902,74 +1007,58 @@ SnapBuildAddCommittedTxns(SnapBuild *builder,
 static void
 SnapBuildPurgeOlderTxn(SnapBuild *builder)
 {
-	int			off;
-	TransactionId *workspace;
-	int			surviving_xids = 0;
+	TransactionId boundary;
 
 	/* not ready yet */
 	if (!TransactionIdIsNormal(builder->xmin))
 		return;
 
-	/* TODO: Neater algorithm than just copying and iterating? */
-	workspace =
-		MemoryContextAlloc(builder->context,
-						   builder->committed.xcnt * sizeof(TransactionId));
+	/*
+	 * Purge committed.xip and catchange.xip using binary search.
+	 *
+	 * XID precedence: NormalTransactionIdPrecedes(a, b) means
+	 * (int32)(a - b) < 0, which identifies XIDs in the modular interval
+	 * [b - 2^31, b) on the 32-bit ring as "preceding b".
+	 *
+	 * Both arrays are sorted in numeric uint32 order: xip[i] <= xip[i+1].
+	 * This is a total order, but the "old" interval [xmin - 2^31, xmin)
+	 * may appear as one or two segments depending on numeric positions.
+	 */
+	boundary = builder->xmin - 0x80000000U;
 
-	/* copy xids that still are interesting to workspace */
-	for (off = 0; off < builder->committed.xcnt; off++)
+	/* Purge committed.xip */
+	if (builder->committed.xcnt > 0)
 	{
-		if (NormalTransactionIdPrecedes(builder->committed.xip[off],
-										builder->xmin))
-			;					/* remove */
-		else
-			workspace[surviving_xids++] = builder->committed.xip[off];
-	}
-
-	/* copy workspace back to persistent state */
-	memcpy(builder->committed.xip, workspace,
-		   surviving_xids * sizeof(TransactionId));
-
-	elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
-		 (uint32) builder->committed.xcnt, (uint32) surviving_xids,
-		 builder->xmin, builder->xmax);
-	builder->committed.xcnt = surviving_xids;
+		int			old_committed_xcnt = builder->committed.xcnt;
 
-	pfree(workspace);
+		builder->committed.xcnt = xid_purge_with_boundary(builder->committed.xip,
+														  old_committed_xcnt,
+														  boundary,
+														  builder->xmin);
 
-	/*
-	 * Purge xids in ->catchange as well. The purged array must also be sorted
-	 * in xidComparator order.
-	 */
+		elog(DEBUG3, "purged committed transactions from %u to %u, xmin: %u, xmax: %u",
+			 (uint32) old_committed_xcnt, (uint32) builder->committed.xcnt,
+			 builder->xmin, builder->xmax);
+	}
+	/* Purge catchange.xip */
 	if (builder->catchange.xcnt > 0)
 	{
-		/*
-		 * Since catchange.xip is sorted, we find the lower bound of xids that
-		 * are still interesting.
-		 */
-		for (off = 0; off < builder->catchange.xcnt; off++)
-		{
-			if (TransactionIdFollowsOrEquals(builder->catchange.xip[off],
-											 builder->xmin))
-				break;
-		}
+		int			old_catchange_xcnt = builder->catchange.xcnt;
 
-		surviving_xids = builder->catchange.xcnt - off;
+		builder->catchange.xcnt = xid_purge_with_boundary(builder->catchange.xip,
+														  old_catchange_xcnt,
+														  boundary,
+														  builder->xmin);
 
-		if (surviving_xids > 0)
-		{
-			memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
-					surviving_xids * sizeof(TransactionId));
-		}
-		else
+		if (builder->catchange.xcnt == 0)
 		{
 			pfree(builder->catchange.xip);
 			builder->catchange.xip = NULL;
 		}
 
 		elog(DEBUG3, "purged catalog modifying transactions from %u to %u, xmin: %u, xmax: %u",
-			 (uint32) builder->catchange.xcnt, (uint32) surviving_xids,
+			 (uint32) old_catchange_xcnt, (uint32) builder->catchange.xcnt,
 			 builder->xmin, builder->xmax);
-		builder->catchange.xcnt = surviving_xids;
 	}
 }
 
-- 
2.51.0

Reply via email to