Hi, Thanks for reviewing.
On Mon, Sep 9, 2024 at 1:11 PM Peter Smith <smithpb2...@gmail.com> wrote: > > 1. > + Note that the inactive timeout invalidation mechanism is not > + applicable for slots on the standby server that are being synced > + from primary server (i.e., standby slots having > > nit - /from primary server/from the primary server/ +1 > 2. ReplicationSlotAcquire > > + errmsg("can no longer get changes from replication slot \"%s\"", > + NameStr(s->data.name)), > + errdetail("This slot has been invalidated because it was inactive > for longer than the amount of time specified by \"%s\".", > + "replication_slot_inactive_timeout."))); > > nit - "replication_slot_inactive_timeout." - should be no period > inside that GUC name literal Typo. Fixed. > 3. ReportSlotInvalidation > > I didn't understand why there was a hint for: > "You might need to increase \"%s\".", "max_slot_wal_keep_size" > > Why aren't these similar cases consistent? It looks misleading and not very useful. What happens if the removed WAL (that's needed for the slot) is put back into pg_wal somehow (by manually copying from archive or by some tool/script)? Can the slot invalidated due to wal_removed start sending WAL to its clients? > But you don't have an equivalent hint for timeout invalidation: > "You might need to increase \"%s\".", "replication_slot_inactive_timeout" I removed this per review comments upthread. > 4. RestoreSlotFromDisk > > + /* Use the same inactive_since time for all the slots. */ > + if (now == 0) > + now = GetCurrentTimestamp(); > + > > Is the deferred assignment really necessary? Why not just > unconditionally assign the 'now' just before the for-loop? Or even at > the declaration? e.g. The 'replication_slot_inactive_timeout' is > measured in seconds so I don't think 'inactive_since' being wrong by a > millisecond here will make any difference. Moved it before the for-loop. > 5. ReplicationSlotSetInactiveSince > > +/* > + * Set slot's inactive_since property unless it was previously invalidated > due > + * to inactive timeout. > + */ > +static inline void > +ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now, > + bool acquire_lock) > +{ > + if (acquire_lock) > + SpinLockAcquire(&s->mutex); > + > + if (s->data.invalidated != RS_INVAL_INACTIVE_TIMEOUT) > + s->inactive_since = *now; > + > + if (acquire_lock) > + SpinLockRelease(&s->mutex); > +} > > Is the logic correct? What if the slot was already invalid due to some > reason other than RS_INVAL_INACTIVE_TIMEOUT? Is an Assert needed? Hm. Since invalidated slots can't be acquired and made active, not modifying inactive_since irrespective of invalidation reason looks good to me. Please find the attached v46 patch having changes for the above review comments and your test review comments and Shveta's review comments. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v46-0001-Introduce-inactive_timeout-based-replication-slo.patch
Description: Binary data