On Tue, Mar 31, 2026 at 10:21 AM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Mon, Mar 30, 2026 at 5:13 PM Masahiko Sawada <[email protected]> wrote:
> >
> > I've reviewed the v6 patch. Here are some comments.
>
> Thank you for reviewing the patch.
>
> >  bool
> >  vacuum_get_cutoffs(Relation rel, const VacuumParams params,
> > -                  struct VacuumCutoffs *cutoffs)
> > +                  struct VacuumCutoffs *cutoffs,
> > +                  TransactionId *slot_xmin,
> > +                  TransactionId *slot_catalog_xmin)
> >
> > How about storing both slot_xmin and catalog_xmin into VacuumCutoffs?
>
> Done.
>
> > ---
> > -   if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED |
> > RS_INVAL_IDLE_TIMEOUT,
> > +   possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | 
> > RS_INVAL_IDLE_TIMEOUT |
> > +       RS_INVAL_XID_AGE;
> > +
> > +   if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
> >                                            _logSegNo, InvalidOid,
> > +                                          InvalidTransactionId,
> > +                                          max_slot_xid_age > 0 ?
> > +                                          ReadNextTransactionId() :
> >                                            InvalidTransactionId))
> >
> > It's odd to me that we specify RS_INVAL_XID_AGE while passing
> > InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along
> > with a valid recentXId only when we'd like to check the slots based on
> > their XIDs.
>
> Done.
>
> > ---
> > +   /* Check if the slot needs to be invalidated due to max_slot_xid_age 
> > GUC */
> > +   if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
> > +   {
> > +       TransactionId xidLimit;
> > +
> > +       Assert(TransactionIdIsValid(recentXid));
> > +
> > +       xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
> > +
> >
> > I think we can avoid calculating xidLimit for every slot by
> > calculating it in InvalidatePossiblyObsoleteSlot() and passing it to
> > DetermineSlotInvalidationCause().
>
> Done.
>
> > ---
> >   */
> >  TransactionId
> >  GetOldestNonRemovableTransactionId(Relation rel)
> > +{
> > +   return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL);
> > +}
> > +
> > +/*
> > + * Same as GetOldestNonRemovableTransactionId(), but also returns the
> > + * replication slot xmin and catalog_xmin from the same 
> > ComputeXidHorizons()
> > + * call.  This avoids a separate ProcArrayLock acquisition when the caller
> > + * needs both values.
> > + */
> > +TransactionId
> > +GetOldestNonRemovableTransactionIdExt(Relation rel,
> > +                                     TransactionId *slot_xmin,
> > +                                     TransactionId *slot_catalog_xmin)
> >  {
> >
> > I understand that the primary reason why the patch introduces another
> > variant of GetOldestNonRemovableTransactionId() is to avoid extra
> > ProcArrayLock acquision to get replication slot xmin and catalog_xmin.
> > While it's not very elegant, I find that it would not be bad because
> > otherwise autovacuum takes extra ProcArrayLock (in shared mode) for
> > every table to vacuum. The ProcArrayLock is already known
> > high-contented lock it would be better to avoid taking it once more.
> > If others think differently, we can just call
> > ProcArrayGetReplicationSlotXmin() separately and compare them to the
> > limit of XID-age based slot invalidation.
>
> I understand the concerns around the ProcArrayLock and I think a new
> function to return the computed slot's xmin and catalog_xmin is good.
>
> > Having said that, I personally don't want to add new instructions to
> > the existing GetOldestNonRemovableTransactionId(). I guess we might
> > want to make both the existing function and new function call a common
> > (inline) function that takes ComputeXidHorizonsResult and returns
> > appropriate transaction id based on the given relation .
>
> Done.
>
> > ---
> > +   # Do some work to advance xids
> > +   $node->safe_psql(
> > +       'postgres', qq[
> > +       do \$\$
> > +       begin
> > +       for i in 1..$nxids loop
> > +           -- use an exception block so that each iteration eats an XID
> > +           begin
> > +           insert into $table_name values (i);
> > +           exception
> > +           when division_by_zero then null;
> > +           end;
> > +       end loop;
> > +       end\$\$;
> > +   ]);
> >
> > I think it's fater to use pg_current_xact_id() instead.
>
> Done. I pulled this from an existing test case in 001_stream_rep.pl.
> Used the pg_current_xact_id approach. Testing times stay the same i.e.
> 9 wallclock secs.
>
> > ---
> > +   else
> > +   {
> > +       $node->safe_psql('postgres', "VACUUM");
> > +   }
> >
> > We don't need to vacuum all tables here.
>
> Fixed.
>
> > ---
> > +# Configure primary with XID age settings. Set autovacuum_naptime high so
> > +# that the checkpointer (not vacuum) triggers the invalidation.
> > +my $max_slot_xid_age = 500;
> > +$primary5->append_conf(
> > +   'postgresql.conf', qq{
> > +max_slot_xid_age = $max_slot_xid_age
> > +autovacuum_naptime = '1h'
> > +});
> >
> > I think that it's better to disable autovacuum than setting a large number.
>
> Done.
>
> > ---
> > +# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
> > +# GUC (via checkpoint).
> >
> > I think that we can say "physical slot" instead of standby's slot to
> > avoid confusion as I thought standby's slot is a slot created on the
> > standby at the first glance.
>
> Fixed.
>
> > ---
> > Do we have tests for invalidating slots on the standbys?
>
> Added a test case for this.
>
> Please find the attached v7 patches for further review. Thank you!

I've reviewed the v7 patch and have some review comments:

+# Advance the given number of XIDs
+sub advance_xids
+{
+   my ($node, $nxids) = @_;
+   my $sql = join(";\n", ("SELECT pg_current_xact_id()") x $nxids);
+   $node->safe_psql('postgres', $sql);
+}

I think we can create a procedure on primary5 instance to consume XIDs
as follow:

$standby5->safe_psql(
   'postgres',
   qq{CREATE PROCEDURE consume_xid(cnt int)
AS \$\$
DECLARE
    i int;
    BEGIN
        FOR i in 1..cnt LOOP
            EXECUTE 'SELECT pg_current_xact_id()';
            COMMIT;
        END LOOP;
    END;
+\$\$
LANGUAGE plpgsql;
});

---
+# Create a subscriber node
+my $subscriber5 = PostgreSQL::Test::Cluster->new('subscriber5');
+$subscriber5->init(allows_streaming => 'logical');
+$subscriber5->start;

Do we really need to create a subscriber for this test? I think we can
simply create a logical slot on the primary5 and test the XID-age
based slot invalidation.

---
I've attached a fixup patch to propose some cleanup and refactoring, including:

- changes to invalidation errdetail message.
- passing xidLimit instead of recentXid to simplify the invalidation logic.
- documentation changes.
- comment changes.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 46aac59cb20..9ad662b8b6f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4772,17 +4772,24 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
       </term>
       <listitem>
        <para>
-        Invalidate replication slots whose <literal>xmin</literal> (the oldest
-        transaction that this slot needs the database to retain) or
-        <literal>catalog_xmin</literal> (the oldest transaction affecting the
-        system catalogs that this slot needs the database to retain) has reached
-        the age specified by this setting. This invalidation check happens
-        during vacuum (both <command>VACUUM</command> command and autovacuum)
-        and during checkpoints.
-        A value of zero (which is default) disables this feature. Users can set
-        this value anywhere from zero to two billion. This parameter can only be
-        set in the <filename>postgresql.conf</filename> file or on the server
-        command line.
+        Invalidate replication slots whose <structfield>xmin</structfield> age
+        or <structfield>catalog_xmin</structfield> age in the
+        <link linkend="view-pg-replication-slots">pg_replication_slots</link>
+        view has exceeded the age specified by this setting. Slot invalidation
+        due to this limit occurs during vacuum (both <command>VACUUM</command>
+        command and autovacuum) and during checkpoint.
+        A value of zero (the default) disables this feature. Users can set
+        this value anywhere from zero to two billion transactions. This parameter
+        can only be set in the <filename>postgresql.conf</filename> file or on
+        the server command line.
+       </para>
+
+       <para>
+        The current age of a slot's <literal>xmin</literal> and
+        <literal>catalog_xmin</literal> can be monitored by applying the
+        <function>age</function> function to the corresponding columns in the
+        <link linkend="view-pg-replication-slots">pg_replication_slots</link>
+        view.
        </para>
 
        <para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 70c1d5c5559..1aec1bcb79e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7019,8 +7019,8 @@ CreateCheckPoint(int flags)
 	VirtualTransactionId *vxids;
 	int			nvxids;
 	int			oldXLogAllowed = 0;
-	uint32		possibleInvalidationCauses;
-	TransactionId recentXid;
+	uint32		slotInvalidationCauses;
+	TransactionId slotXidLimit;
 
 	/*
 	 * An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7450,19 +7450,19 @@ CreateCheckPoint(int flags)
 	XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
 	KeepLogSeg(recptr, &_logSegNo);
 
-	possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
-	recentXid = InvalidTransactionId;
-
+	slotInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+	slotXidLimit = InvalidTransactionId;
 	if (max_slot_xid_age > 0)
 	{
-		possibleInvalidationCauses |= RS_INVAL_XID_AGE;
-		recentXid = ReadNextTransactionId();
+		slotInvalidationCauses |= RS_INVAL_XID_AGE;
+		slotXidLimit = TransactionIdRetreatedBy(ReadNextTransactionId(),
+												max_slot_xid_age);
 	}
 
-	if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
+	if (InvalidateObsoleteReplicationSlots(slotInvalidationCauses,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId,
-										   recentXid))
+										   slotXidLimit))
 	{
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
@@ -7743,8 +7743,8 @@ CreateRestartPoint(int flags)
 	XLogRecPtr	endptr;
 	XLogSegNo	_logSegNo;
 	TimestampTz xtime;
-	uint32		possibleInvalidationCauses;
-	TransactionId recentXid;
+	uint32		slotInvalidationCauses;
+	TransactionId slotXidLimit;
 
 	/* Concurrent checkpoint/restartpoint cannot happen */
 	Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7919,19 +7919,19 @@ CreateRestartPoint(int flags)
 
 	INJECTION_POINT("restartpoint-before-slot-invalidation", NULL);
 
-	possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
-	recentXid = InvalidTransactionId;
-
+	slotInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+	xidLimit = InvalidTransactionId;
 	if (max_slot_xid_age > 0)
 	{
-		possibleInvalidationCauses |= RS_INVAL_XID_AGE;
-		recentXid = ReadNextTransactionId();
+		slotInvalidationCauses |= RS_INVAL_XID_AGE;
+		slotXidLimit = TransactionIdRetreatedBy(ReadNextTransactionId(),
+												max_slot_xid_age);
 	}
 
-	if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
+	if (InvalidateObsoleteReplicationSlots(slotInvalidationCauses,
 										   _logSegNo, InvalidOid,
 										   InvalidTransactionId,
-										   recentXid))
+										   slotXidLimit))
 	{
 		/*
 		 * Some slots have been invalidated; recalculate the old-segment
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7f36f795b72..cda86b9d50f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,9 +1133,10 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
 	 * that only one vacuum process can be working on a particular table at
 	 * any time, and that each vacuum is always an independent transaction.
 	 */
-	cutoffs->OldestXmin = GetOldestNonRemovableTransactionIdExt(rel,
-																&cutoffs->slot_xmin,
-																&cutoffs->slot_catalog_xmin);
+	cutoffs->OldestXmin =
+		GetOldestNonRemovableTransactionIdWithSlotXids(rel,
+													   &cutoffs->slot_xmin,
+													   &cutoffs->slot_catalog_xmin);
 
 	Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
 
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 20729d2fb42..e6271e2a519 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1790,7 +1790,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 					   long slot_idle_seconds,
 					   TransactionId xmin,
 					   TransactionId catalog_xmin,
-					   TransactionId recentXid)
+					   TransactionId xidLimit)
 {
 	StringInfoData err_detail;
 	StringInfoData err_hint;
@@ -1838,20 +1838,18 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
 
 		case RS_INVAL_XID_AGE:
 			{
-				Assert(TransactionIdIsValid(xmin) || TransactionIdIsValid(catalog_xmin));
-
-				if (TransactionIdIsValid(xmin))
-				{
-					/* translator: %s is a GUC variable name */
-					appendStringInfo(&err_detail, _("The slot's xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
-									 xmin, (int32) (recentXid - xmin), "max_slot_xid_age", max_slot_xid_age);
-				}
-				else
-				{
-					/* translator: %s is a GUC variable name */
-					appendStringInfo(&err_detail, _("The slot's catalog_xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
-									 catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", max_slot_xid_age);
-				}
+				TransactionId slot_xid = TransactionIdIsValid(xmin) ? xmin : catalog_xmin;
+				int32		exceeded_by = (int32) (xidLimit - slot_xid);
+				int32		slot_age = (int32) max_slot_xid_age + exceeded_by;
+
+				Assert(TransactionIdIsValid(slot_xid));
+
+				/* translator: %s is a GUC variable name */
+				appendStringInfo(&err_detail,
+								 TransactionIdIsValid(xmin)
+								 ? _("The slot's xmin age of %d exceeds the configured \"%s\" of %d by %d transactions")
+								 : _("The slot's catalog xmin age of %d exceeds the configured \"%s\" of %d by %d transactions"),
+								 slot_age, "max_slot_xid_age", max_slot_xid_age, exceeded_by);
 
 				/* translator: %s is a GUC variable name */
 				appendStringInfo(&err_hint, _("You might need to increase \"%s\"."),
@@ -2033,19 +2031,13 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 							   ReplicationSlot *s,
 							   XLogRecPtr oldestLSN,
 							   Oid dboid, TransactionId snapshotConflictHorizon,
-							   TransactionId recentXid,
+							   TransactionId xidLimit,
 							   bool *released_lock_out)
 {
 	int			last_signaled_pid = 0;
 	bool		released_lock = false;
 	bool		invalidated = false;
 	TimestampTz inactive_since = 0;
-	TransactionId xidLimit = InvalidTransactionId;
-
-	/* Compute the XID limit once, to avoid redundant work per slot */
-	if ((possible_causes & RS_INVAL_XID_AGE) &&
-		TransactionIdIsValid(recentXid))
-		xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
 
 	for (;;)
 	{
@@ -2187,7 +2179,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 									   slotname, restart_lsn,
 									   oldestLSN, snapshotConflictHorizon,
 									   slot_idle_secs, s->data.xmin,
-									   s->data.catalog_xmin, recentXid);
+									   s->data.catalog_xmin, xidLimit);
 
 				if (MyBackendType == B_STARTUP)
 					(void) SignalRecoveryConflict(GetPGProcByNumber(active_proc),
@@ -2241,7 +2233,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
 								   slotname, restart_lsn,
 								   oldestLSN, snapshotConflictHorizon,
 								   slot_idle_secs, s->data.xmin,
-								   s->data.catalog_xmin, recentXid);
+								   s->data.catalog_xmin, xidLimit);
 
 			/* done with this slot for now */
 			break;
@@ -2284,7 +2276,7 @@ bool
 InvalidateObsoleteReplicationSlots(uint32 possible_causes,
 								   XLogSegNo oldestSegno, Oid dboid,
 								   TransactionId snapshotConflictHorizon,
-								   TransactionId recentXid)
+								   TransactionId xidLimit)
 {
 	XLogRecPtr	oldestLSN;
 	bool		invalidated = false;
@@ -2323,7 +2315,7 @@ restart:
 
 		if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN,
 										   dboid, snapshotConflictHorizon,
-										   recentXid, &released_lock))
+										   xidLimit, &released_lock))
 		{
 			Assert(released_lock);
 
@@ -3391,7 +3383,7 @@ MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
 														 0,
 														 InvalidOid,
 														 InvalidTransactionId,
-														 recentXid);
+														 xidLimit);
 
 	return invalidated;
 }
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9e0acf7309d..898ef4a0833 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1938,10 +1938,11 @@ GlobalVisHorizonKindForRel(Relation rel)
 }
 
 /*
- * Helper to return the appropriate oldest non-removable TransactionId from
- * pre-computed horizons, based on the relation type.
+ * A helper function to return the appropriate oldest non-removable
+ * TransactionId from the pre-computed horizons, based on the relation
+ * type.
  */
-static inline TransactionId
+static pg_attribute_always_inline TransactionId
 GetOldestNonRemovableTransactionIdFromHorizons(ComputeXidHorizonsResult *horizons,
 											   Relation rel)
 {
@@ -1989,9 +1990,9 @@ GetOldestNonRemovableTransactionId(Relation rel)
  * needs both values.
  */
 TransactionId
-GetOldestNonRemovableTransactionIdExt(Relation rel,
-									  TransactionId *slot_xmin,
-									  TransactionId *slot_catalog_xmin)
+GetOldestNonRemovableTransactionIdWithSlotXids(Relation rel,
+											   TransactionId *slot_xmin,
+											   TransactionId *slot_catalog_xmin)
 {
 	ComputeXidHorizonsResult horizons;
 
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 0baa7112559..d143b19a7b3 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -371,7 +371,7 @@ extern bool InvalidateObsoleteReplicationSlots(uint32 possible_causes,
 											   XLogSegNo oldestSegno,
 											   Oid dboid,
 											   TransactionId snapshotConflictHorizon,
-											   TransactionId recentXid);
+											   TransactionId xidLimit);
 extern bool MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
 										TransactionId slot_catalog_xmin);
 extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c198fd22515..a94091ce7fd 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -53,9 +53,9 @@ extern RunningTransactions GetRunningTransactionData(void);
 
 extern bool TransactionIdIsInProgress(TransactionId xid);
 extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
-extern TransactionId GetOldestNonRemovableTransactionIdExt(Relation rel,
-														   TransactionId *slot_xmin,
-														   TransactionId *slot_catalog_xmin);
+extern TransactionId GetOldestNonRemovableTransactionIdWithSlotXids(Relation rel,
+																	TransactionId *slot_xmin,
+																	TransactionId *slot_catalog_xmin);
 extern TransactionId GetOldestTransactionIdConsideredRunning(void);
 extern TransactionId GetOldestActiveTransactionId(bool inCommitOnly,
 												  bool allDbs);

Reply via email to