On Sun, Mar 29, 2026 at 6:35 PM Bharath Rupireddy
<[email protected]> wrote:
>
> Hi,
>
> On Sun, Mar 29, 2026 at 1:16 PM Srinath Reddy Sadipiralla
> <[email protected]> wrote:
> >
> > Hello,
> >
> > Thanks for the v5 patch set, I have reviewed and did initial testing on
> > v5 patch set, and it LGTM, except these
>
> Thank you for reviewing and testing. I appreciate it.
>
> > diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
> > index 286f0f46341..c2ff7e464f0 100644
> > --- a/src/backend/replication/slot.c
> > +++ b/src/backend/replication/slot.c
> > @@ -1849,7 +1849,7 @@ 
> > ReportSlotInvalidation(ReplicationSlotInvalidationCause cause,
> >                                 else
> >                                 {
> >                                         /* translator: %s is a GUC variable 
> > name */
> > -                                       appendStringInfo(&err_detail, 
> > _("The slot's xmin %u is %d transactions old, which exceeds the configured 
> > \"%s\" value of %d."),
> > +                                       appendStringInfo(&err_detail, 
> > _("The slot's catalog_xmin %u is %d transactions old, which exceeds the 
> > configured \"%s\" value of %d."),
> >                                                                          
> > catalog_xmin, (int32) (recentXid - catalog_xmin), "max_slot_xid_age", 
> > max_slot_xid_age);
> >                                 }
>
> Fixed the typo.
>
> > while testing the active slot XID age invalidation (SIGTERM path) , i
> > observed that slot got invalidated , walsender was killed because of
> > SIGTERM , then starts the infinite-retry-cycle problem where
> > walreceiver starts walsender and walsender will try to use an invalidated
> > slot and dies, will think more on this.
>
> I would like to clarify that once a slot is invalidated due to any of
> the reasons (ReplicationSlotInvalidationCause), it becomes unusable;
> the sender will error out if the receiver tries to use it. This is
> consistent with all existing slot invalidation mechanisms.
>
> Please find the attached v6 patches fixing the typo for further review.
>

I've reviewed the v6 patch. Here are some comments.

 bool
 vacuum_get_cutoffs(Relation rel, const VacuumParams params,
-                  struct VacuumCutoffs *cutoffs)
+                  struct VacuumCutoffs *cutoffs,
+                  TransactionId *slot_xmin,
+                  TransactionId *slot_catalog_xmin)

How about storing both slot_xmin and catalog_xmin into VacuumCutoffs?

---
-   if (InvalidateObsoleteReplicationSlots(RS_INVAL_WAL_REMOVED |
RS_INVAL_IDLE_TIMEOUT,
+   possibleInvalidationCauses = RS_INVAL_WAL_REMOVED | RS_INVAL_IDLE_TIMEOUT |
+       RS_INVAL_XID_AGE;
+
+   if (InvalidateObsoleteReplicationSlots(possibleInvalidationCauses,
                                           _logSegNo, InvalidOid,
+                                          InvalidTransactionId,
+                                          max_slot_xid_age > 0 ?
+                                          ReadNextTransactionId() :
                                           InvalidTransactionId))

It's odd to me that we specify RS_INVAL_XID_AGE while passing
InvalidTransactionId. I think we can specify RS_INVAL_XID_AGE along
with a valid recentXId only when we'd like to check the slots based on
their XIDs.

---
+   /* Check if the slot needs to be invalidated due to max_slot_xid_age GUC */
+   if ((possible_causes & RS_INVAL_XID_AGE) && CanInvalidateXidAgedSlot(s))
+   {
+       TransactionId xidLimit;
+
+       Assert(TransactionIdIsValid(recentXid));
+
+       xidLimit = TransactionIdRetreatedBy(recentXid, max_slot_xid_age);
+

I think we can avoid calculating xidLimit for every slot by
calculating it in InvalidatePossiblyObsoleteSlot() and passing it to
DetermineSlotInvalidationCause().

---
  */
 TransactionId
 GetOldestNonRemovableTransactionId(Relation rel)
+{
+   return GetOldestNonRemovableTransactionIdExt(rel, NULL, NULL);
+}
+
+/*
+ * Same as GetOldestNonRemovableTransactionId(), but also returns the
+ * replication slot xmin and catalog_xmin from the same ComputeXidHorizons()
+ * call.  This avoids a separate ProcArrayLock acquisition when the caller
+ * needs both values.
+ */
+TransactionId
+GetOldestNonRemovableTransactionIdExt(Relation rel,
+                                     TransactionId *slot_xmin,
+                                     TransactionId *slot_catalog_xmin)
 {

I understand that the primary reason why the patch introduces another
variant of GetOldestNonRemovableTransactionId() is to avoid extra
ProcArrayLock acquision to get replication slot xmin and catalog_xmin.
While it's not very elegant, I find that it would not be bad because
otherwise autovacuum takes extra ProcArrayLock (in shared mode) for
every table to vacuum. The ProcArrayLock is already known
high-contented lock it would be better to avoid taking it once more.
If others think differently, we can just call
ProcArrayGetReplicationSlotXmin() separately and compare them to the
limit of XID-age based slot invalidation.

Having said that, I personally don't want to add new instructions to
the existing GetOldestNonRemovableTransactionId(). I guess we might
want to make both the existing function and new function call a common
(inline) function that takes ComputeXidHorizonsResult and returns
appropriate transaction id based on the given relation .

---
+   # Do some work to advance xids
+   $node->safe_psql(
+       'postgres', qq[
+       do \$\$
+       begin
+       for i in 1..$nxids loop
+           -- use an exception block so that each iteration eats an XID
+           begin
+           insert into $table_name values (i);
+           exception
+           when division_by_zero then null;
+           end;
+       end loop;
+       end\$\$;
+   ]);

I think it's fater to use pg_current_xact_id() instead.

---
+   else
+   {
+       $node->safe_psql('postgres', "VACUUM");
+   }

We don't need to vacuum all tables here.

---
+# Configure primary with XID age settings. Set autovacuum_naptime high so
+# that the checkpointer (not vacuum) triggers the invalidation.
+my $max_slot_xid_age = 500;
+$primary5->append_conf(
+   'postgresql.conf', qq{
+max_slot_xid_age = $max_slot_xid_age
+autovacuum_naptime = '1h'
+});

I think that it's better to disable autovacuum than setting a large number.

---
+# Testcase end: Invalidate streaming standby's slot due to max_slot_xid_age
+# GUC (via checkpoint).

I think that we can say "physical slot" instead of standby's slot to
avoid confusion as I thought standby's slot is a slot created on the
standby at the first glance.

---
Do we have tests for invalidating slots on the standbys?

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com


Reply via email to