On Sun, Oct 12, 2025 at 11:27 PM Bertrand Drouvot <[email protected]> wrote: > > Hi, > > On Thu, Oct 09, 2025 at 10:49:39AM +0800, suyu.cmj wrote: > > Hi, > > Thank you for the reference to commit 818fefd8fd4 and the related > > discussion thread. I understand the intent of introducing > > initial_restart_lsn was to preserve a consistent invalidation cause > > throughout the invalidation loop. > > However, I still have a few concerns about this design change: > > 1. I understand the intention to keep the invalidation cause consistent, > > but If a slot's restart_lsn advances significantly during the invalidation > > check—indicating it is actively in use—shouldn't we reconsider invalidating > > it? > > 2. What potential issues arise if we refrain from invalidating slots whose > > restart_lsn advances during the invalidation process? Intuitively, an > > actively used slot that has moved it's restart_lsn beyond the problematic > > point should not be marked invalid. > > 3. If the current approach is indeed correct, should we consider making > > PG15 and earlier consistent with this behavior? The behavioral difference > > across versions may lead to different operational outcomes in otherwise > > similar situations. > > I would appreciate your insights on these points. > > I agree that before 818fefd8fd4 the invalidation cause could move from > RS_INVAL_WAL_REMOVED to RS_INVAL_NONE if the slot restart lsn has been able to > advance enough between the time we release the mutex and do the next check. > > With 818fefd8fd4 that's not the case anymore and we keep WAL_REMOVED as the > invalidation cause (even if the slot restart lsn has been able to advance > enough). > > That looks safe to use the pre 818fefd8fd4 behavior for the slot restart lsn > case because the WAL files have not yet been removed by the > checkpointer/startup > process when it's busy in InvalidatePossiblyObsoleteSlot(). > > I think that we could get rid of the initial_restart_lsn and just use > s->data.restart_lsn here (while keeping initial xmin ones to preserve the > intent of 818fefd8fd4 for those).
IIUC with the proposed patch, it's possible that we report the slot invalidation once but don't actually invalidate the slot if slot's restart_lsn gets advanced and becomes greater than the oldestLSN after the report, is that right? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
