On Tue, Mar 31, 2026 at 10:21 AM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Mon, Mar 30, 2026 at 5:13 PM Masahiko Sawada <[email protected]> wrote:
> >
> > I've reviewed the v6 patch. Here are some comments.
>
> Thank you for reviewing the patch.
>
> > bool
> > vacuum_get_cutoffs(Relation rel, const VacuumParams params,
> > - struct VacuumCutoffs *cutoffs)
> > + struct VacuumCutoffs *cutoffs,
> > + TransactionId *slot_xmin,
> > + TransactionId *slot_catalog_xmin)
> >
> > How about storing both slot_xmin and catalog_xmin into VacuumCutoffs?
>
> Done.
>
> > ---
> > - if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED |
> > RS_INVAL_IDLE_TIMEOUT,
> > + possibleInvalidationCauses = RS_INVAL_WAL_REMOVED |
> > RS_INVAL_IDLE_TIMEOUT |
> > + RS_INVAL_XID_AGE;
> > +
> > + if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
> > _logSegNo, InvalidOid,
> > + InvalidTransactionId,
> > + max_slot_xid_age > 0 ?
> > + ReadNextTransactionId() :
> > InvalidTransactionId))
> >
> > It's odd to me that we specify RS_INVAL_XID_AGE while passing
> > InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along
> > with a valid recentXId only when we'd like to check the slots based on
> > their XIDs.
>
> Done.
>
> > ---
> > + /* Check if the slot needs to be invalidated due to max_slot_xid_age
> > GUC */
> > + if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
> > + {
> > + TransactionId xidLimit;
> > +
> > + Assert(TransactionIdIsValid(recentXid));
> > +
> > + xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
> > +
> >
> > I think we can avoid calculating xidLimit for every slot by
> > calculating it in InvalidatePossiblyObsoleteSlot() and passing it to
> > DetermineSlotInvalidationCause().
>
> Done.
>
> > ---
> > */
> > TransactionId
> > GetOldestNonRemovableTransactionId(Relation rel)
> > +{
> > + return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL);
> > +}
> > +
> > +/*
> > + * Same as GetOldestNonRemovableTransactionId(), but also returns the
> > + * replication slot xmin and catalog_xmin from the same
> > ComputeXidHorizons()
> > + * call. This avoids a separate ProcArrayLock acquisition when the caller
> > + * needs both values.
> > + */
> > +TransactionId
> > +GetOldestNonRemovableTransactionIdExt(Relation rel,
> > + TransactionId *slot_xmin,
> > + TransactionId *slot_catalog_xmin)
> > {
> >
> > I understand that the primary reason why the patch introduces another
> > variant of GetOldestNonRemovableTransactionId() is to avoid extra
> > ProcArrayLock acquision to get replication slot xmin and catalog_xmin.
> > While it's not very elegant, I find that it would not be bad because
> > otherwise autovacuum takes extra ProcArrayLock (in shared mode) for
> > every table to vacuum. The ProcArrayLock is already known
> > high-contented lock it would be better to avoid taking it once more.
> > If others think differently, we can just call
> > ProcArrayGetReplicationSlotXmin() separately and compare them to the
> > limit of XID-age based slot invalidation.
>
> I understand the concerns around the ProcArrayLock and I think a new
> function to return the computed slot's xmin and catalog_xmin is good.
>
> > Having said that, I personally don't want to add new instructions to
> > the existing GetOldestNonRemovableTransactionId(). I guess we might
> > want to make both the existing function and new function call a common
> > (inline) function that takes ComputeXidHorizonsResult and returns
> > appropriate transaction id based on the given relation .
>
> Done.
>
> > ---
> > + # Do some work to advance xids
> > + $node->safe_psql(
> > + 'postgres', qq[
> > + do \$\$
> > + begin
> > + for i in 1..$nxids loop
> > + -- use an exception block so that each iteration eats an XID
> > + begin
> > + insert into $table_name values (i);
> > + exception
> > + when division_by_zero then null;
> > + end;
> > + end loop;
> > + end\$\$;
> > + ]);
> >
> > I think it's fater to use pg_current_xact_id() instead.
>
> Done. I pulled this from an existing test case in 001_stream_rep.pl.
> Used the pg_current_xact_id approach. Testing times stay the same i.e.
> 9 wallclock secs.
>
> > ---
> > + else
> > + {
> > + $node->safe_psql('postgres', "VACUUM");
> > + }
> >
> > We don't need to vacuum all tables here.
>
> Fixed.
>
> > ---
> > +# Configure primary with XID age settings. Set autovacuum_naptime high so
> > +# that the checkpointer (not vacuum) triggers the invalidation.
> > +my $max_slot_xid_age = 500;
> > +$primary5->append_conf(
> > + 'postgresql.conf', qq{
> > +max_slot_xid_age = $max_slot_xid_age
> > +autovacuum_naptime = '1h'
> > +});
> >
> > I think that it's better to disable autovacuum than setting a large number.
>
> Done.
>
> > ---
> > +# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
> > +# GUC (via checkpoint).
> >
> > I think that we can say "physical slot" instead of standby's slot to
> > avoid confusion as I thought standby's slot is a slot created on the
> > standby at the first glance.
>
> Fixed.
>
> > ---
> > Do we have tests for invalidating slots on the standbys?
>
> Added a test case for this.
>
> Please find the attached v7 patches for further review. Thank you!
I've reviewed the v7 patch and have some review comments:
+# Advance the given number of XIDs
+sub advance_xids
+{
+ my ($node, $nxids) = @_;
+ my $sql = join(";\n", ("SELECT pg_current_xact_id()") x $nxids);
+ $node->safe_psql('postgres', $sql);
+}
I think we can create a procedure on primary5 instance to consume XIDs
as follow:
$standby5->safe_psql(
'postgres',
qq{CREATE PROCEDURE consume_xid(cnt int)
AS \$\$
DECLARE
i int;
BEGIN
FOR i in 1..cnt LOOP
EXECUTE 'SELECT pg_current_xact_id()';
COMMIT;
END LOOP;
END;
+\$\$
LANGUAGE plpgsql;
});
---
+# Create a subscriber node
+my $subscriber5 = PostgreSQL::Test::Cluster->new('subscriber5');
+$subscriber5->init(allows_streaming => 'logical');
+$subscriber5->start;
Do we really need to create a subscriber for this test? I think we can
simply create a logical slot on the primary5 and test the XID-age
based slot invalidation.
---
I've attached a fixup patch to propose some cleanup and refactoring, including:
- changes to invalidation errdetail message.
- passing xidLimit instead of recentXid to simplify the invalidation logic.
- documentation changes.
- comment changes.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 46aac59cb20..9ad662b8b6f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -4772,17 +4772,24 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"' # Windows
</term>
<listitem>
<para>
- Invalidate replication slots whose <literal>xmin</literal> (the oldest
- transaction that this slot needs the database to retain) or
- <literal>catalog_xmin</literal> (the oldest transaction affecting the
- system catalogs that this slot needs the database to retain) has reached
- the age specified by this setting. This invalidation check happens
- during vacuum (both <command>VACUUM</command> command and autovacuum)
- and during checkpoints.
- A value of zero (which is default) disables this feature. Users can set
- this value anywhere from zero to two billion. This parameter can only be
- set in the <filename>postgresql.conf</filename> file or on the server
- command line.
+ Invalidate replication slots whose <structfield>xmin</structfield> age
+ or <structfield>catalog_xmin</structfield> age in the
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>
+ view has exceeded the age specified by this setting. Slot invalidation
+ due to this limit occurs during vacuum (both <command>VACUUM</command>
+ command and autovacuum) and during checkpoint.
+ A value of zero (the default) disables this feature. Users can set
+ this value anywhere from zero to two billion transactions. This parameter
+ can only be set in the <filename>postgresql.conf</filename> file or on
+ the server command line.
+ </para>
+
+ <para>
+ The current age of a slot's <literal>xmin</literal> and
+ <literal>catalog_xmin</literal> can be monitored by applying the
+ <function>age</function> function to the corresponding columns in the
+ <link linkend="view-pg-replication-slots">pg_replication_slots</link>
+ view.
</para>
<para>
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 70c1d5c5559..1aec1bcb79e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7019,8 +7019,8 @@ CreateCheckPoint(int flags)
VirtualTransactionId *vxids;
int nvxids;
int oldXLogAllowed = 0;
- uint32 possibleInvalidationCauses;
- TransactionId recentXid;
+ uint32 slotInvalidationCauses;
+ TransactionId slotXidLimit;
/*
* An end-of-recovery checkpoint is really a shutdown checkpoint, just
@@ -7450,19 +7450,19 @@ CreateCheckPoint(int flags)
XLByteToSeg(RedoRecPtr, _logSegNo, wal_segment_size);
KeepLogSeg(recptr, &_logSegNo);
- possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
- recentXid = InvalidTransactionId;
-
+ slotInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+ slotXidLimit = InvalidTransactionId;
if (max_slot_xid_age > 0)
{
- possibleInvalidationCauses |= RS_INVAL_XID_AGE;
- recentXid = ReadNextTransactionId();
+ slotInvalidationCauses |= RS_INVAL_XID_AGE;
+ slotXidLimit = TransactionIdRetreatedBy(ReadNextTransactionId(),
+ max_slot_xid_age);
}
- if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
+ if (InvalidateObsoleteReplicationSlots(slotInvalidationCauses,
_logSegNo, InvalidOid,
InvalidTransactionId,
- recentXid))
+ slotXidLimit))
{
/*
* Some slots have been invalidated; recalculate the old-segment
@@ -7743,8 +7743,8 @@ CreateRestartPoint(int flags)
XLogRecPtr endptr;
XLogSegNo _logSegNo;
TimestampTz xtime;
- uint32 possibleInvalidationCauses;
- TransactionId recentXid;
+ uint32 slotInvalidationCauses;
+ TransactionId slotXidLimit;
/* Concurrent checkpoint/restartpoint cannot happen */
Assert(!IsUnderPostmaster || MyBackendType == B_CHECKPOINTER);
@@ -7919,19 +7919,19 @@ CreateRestartPoint(int flags)
INJECTION_POINT("restartpoint-before-slot-invalidation", NULL);
- possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
- recentXid = InvalidTransactionId;
-
+ slotInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT;
+ xidLimit = InvalidTransactionId;
if (max_slot_xid_age > 0)
{
- possibleInvalidationCauses |= RS_INVAL_XID_AGE;
- recentXid = ReadNextTransactionId();
+ slotInvalidationCauses |= RS_INVAL_XID_AGE;
+ slotXidLimit = TransactionIdRetreatedBy(ReadNextTransactionId(),
+ max_slot_xid_age);
}
- if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
+ if (InvalidateObsoleteReplicationSlots(slotInvalidationCauses,
_logSegNo, InvalidOid,
InvalidTransactionId,
- recentXid))
+ slotXidLimit))
{
/*
* Some slots have been invalidated; recalculate the old-segment
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 7f36f795b72..cda86b9d50f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1133,9 +1133,10 @@ vacuum_get_cutoffs(Relation rel, const VacuumParams *params,
* that only one vacuum process can be working on a particular table at
* any time, and that each vacuum is always an independent transaction.
*/
- cutoffs->OldestXmin = GetOldestNonRemovableTransactionIdExt(rel,
- &cutoffs->slot_xmin,
- &cutoffs->slot_catalog_xmin);
+ cutoffs->OldestXmin =
+ GetOldestNonRemovableTransactionIdWithSlotXids(rel,
+ &cutoffs->slot_xmin,
+ &cutoffs->slot_catalog_xmin);
Assert(TransactionIdIsNormal(cutoffs->OldestXmin));
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 20729d2fb42..e6271e2a519 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1790,7 +1790,7 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
long slot_idle_seconds,
TransactionId xmin,
TransactionId catalog_xmin,
- TransactionId recentXid)
+ TransactionId xidLimit)
{
StringInfoData err_detail;
StringInfoData err_hint;
@@ -1838,20 +1838,18 @@ ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
case RS_INVAL_XID_AGE:
{
- Assert(TransactionIdIsValid(xmin) || TransactionIdIsValid(catalog_xmin));
-
- if (TransactionIdIsValid(xmin))
- {
- /* translator: %s is a GUC variable name */
- appendStringInfo(&err_detail, _("The slot's xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
- xmin, (int32) (recentXid - xmin), "max_slot_xid_age", max_slot_xid_age);
- }
- else
- {
- /* translator: %s is a GUC variable name */
- appendStringInfo(&err_detail, _("The slot's catalog_xmin %u is %d transactions old, which exceeds the configured \"%s\" value of %d."),
- catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", max_slot_xid_age);
- }
+ TransactionId slot_xid = TransactionIdIsValid(xmin) ? xmin : catalog_xmin;
+ int32 exceeded_by = (int32) (xidLimit - slot_xid);
+ int32 slot_age = (int32) max_slot_xid_age + exceeded_by;
+
+ Assert(TransactionIdIsValid(slot_xid));
+
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(&err_detail,
+ TransactionIdIsValid(xmin)
+ ? _("The slot's xmin age of %d exceeds the configured \"%s\" of %d by %d transactions")
+ : _("The slot's catalog xmin age of %d exceeds the configured \"%s\" of %d by %d transactions"),
+ slot_age, "max_slot_xid_age", max_slot_xid_age, exceeded_by);
/* translator: %s is a GUC variable name */
appendStringInfo(&err_hint, _("You might need to increase \"%s\"."),
@@ -2033,19 +2031,13 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
ReplicationSlot *s,
XLogRecPtr oldestLSN,
Oid dboid, TransactionId snapshotConflictHorizon,
- TransactionId recentXid,
+ TransactionId xidLimit,
bool *released_lock_out)
{
int last_signaled_pid = 0;
bool released_lock = false;
bool invalidated = false;
TimestampTz inactive_since = 0;
- TransactionId xidLimit = InvalidTransactionId;
-
- /* Compute the XID limit once, to avoid redundant work per slot */
- if ((possible_causes & RS_INVAL_XID_AGE) &&
- TransactionIdIsValid(recentXid))
- xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
for (;;)
{
@@ -2187,7 +2179,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
slotname, restart_lsn,
oldestLSN, snapshotConflictHorizon,
slot_idle_secs, s->data.xmin,
- s->data.catalog_xmin, recentXid);
+ s->data.catalog_xmin, xidLimit);
if (MyBackendType == B_STARTUP)
(void) SignalRecoveryConflict(GetPGProcByNumber(active_proc),
@@ -2241,7 +2233,7 @@ InvalidatePossiblyObsoleteSlot(uint32 possible_causes,
slotname, restart_lsn,
oldestLSN, snapshotConflictHorizon,
slot_idle_secs, s->data.xmin,
- s->data.catalog_xmin, recentXid);
+ s->data.catalog_xmin, xidLimit);
/* done with this slot for now */
break;
@@ -2284,7 +2276,7 @@ bool
InvalidateObsoleteReplicationSlots(uint32 possible_causes,
XLogSegNo oldestSegno, Oid dboid,
TransactionId snapshotConflictHorizon,
- TransactionId recentXid)
+ TransactionId xidLimit)
{
XLogRecPtr oldestLSN;
bool invalidated = false;
@@ -2323,7 +2315,7 @@ restart:
if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN,
dboid, snapshotConflictHorizon,
- recentXid, &released_lock))
+ xidLimit, &released_lock))
{
Assert(released_lock);
@@ -3391,7 +3383,7 @@ MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
0,
InvalidOid,
InvalidTransactionId,
- recentXid);
+ xidLimit);
return invalidated;
}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 9e0acf7309d..898ef4a0833 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1938,10 +1938,11 @@ GlobalVisHorizonKindForRel(Relation rel)
}
/*
- * Helper to return the appropriate oldest non-removable TransactionId from
- * pre-computed horizons, based on the relation type.
+ * A helper function to return the appropriate oldest non-removable
+ * TransactionId from the pre-computed horizons, based on the relation
+ * type.
*/
-static inline TransactionId
+static pg_attribute_always_inline TransactionId
GetOldestNonRemovableTransactionIdFromHorizons(ComputeXidHorizonsResult *horizons,
Relation rel)
{
@@ -1989,9 +1990,9 @@ GetOldestNonRemovableTransactionId(Relation rel)
* needs both values.
*/
TransactionId
-GetOldestNonRemovableTransactionIdExt(Relation rel,
- TransactionId *slot_xmin,
- TransactionId *slot_catalog_xmin)
+GetOldestNonRemovableTransactionIdWithSlotXids(Relation rel,
+ TransactionId *slot_xmin,
+ TransactionId *slot_catalog_xmin)
{
ComputeXidHorizonsResult horizons;
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 0baa7112559..d143b19a7b3 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -371,7 +371,7 @@ extern bool InvalidateObsoleteReplicationSlots(uint32 possible_causes,
XLogSegNo oldestSegno,
Oid dboid,
TransactionId snapshotConflictHorizon,
- TransactionId recentXid);
+ TransactionId xidLimit);
extern bool MaybeInvalidateXIDAgedSlots(TransactionId slot_xmin,
TransactionId slot_catalog_xmin);
extern ReplicationSlot *SearchNamedReplicationSlot(const char *name, bool need_lock);
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c198fd22515..a94091ce7fd 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -53,9 +53,9 @@ extern RunningTransactions GetRunningTransactionData(void);
extern bool TransactionIdIsInProgress(TransactionId xid);
extern TransactionId GetOldestNonRemovableTransactionId(Relation rel);
-extern TransactionId GetOldestNonRemovableTransactionIdExt(Relation rel,
- TransactionId *slot_xmin,
- TransactionId *slot_catalog_xmin);
+extern TransactionId GetOldestNonRemovableTransactionIdWithSlotXids(Relation rel,
+ TransactionId *slot_xmin,
+ TransactionId *slot_catalog_xmin);
extern TransactionId GetOldestTransactionIdConsideredRunning(void);
extern TransactionId GetOldestActiveTransactionId(bool inCommitOnly,
bool allDbs);