On Wed, Nov 12, 2025 at 10:42 PM Masahiko Sawada <[email protected]> wrote: > > On Wed, Nov 12, 2025 at 3:42 AM shveta malik <[email protected]> wrote: > > > > On Wed, Nov 12, 2025 at 3:36 PM Masahiko Sawada <[email protected]> > > wrote: > > > > > > On Tue, Nov 11, 2025 at 6:05 PM Masahiko Sawada <[email protected]> > > > wrote: > > > > > > > > On Mon, Nov 10, 2025 at 8:05 PM shveta malik <[email protected]> > > > > wrote: > > > > > > > > > > On Thu, Nov 6, 2025 at 4:32 AM Masahiko Sawada > > > > > <[email protected]> wrote: > > > > > > > > > > > > > > > > > > I've updated and rebased the patch. > > > > > > > > > > > > > > > > Thanks for the patch. Please find a few comments: > > > > > > > > > > > > > > > 1) > > > > > ReplicationSlotsDropDBSlots: > > > > > > > > > > + SpinLockAcquire(&s->mutex); > > > > > + invalidated = s->data.invalidated == RS_INVAL_NONE; > > > > > + SpinLockRelease(&s->mutex); > > > > > + > > > > > + /* > > > > > + * Count slots on other databases too so we can disable logical > > > > > + * decoding only if no slots in the cluster. > > > > > + */ > > > > > + if (invalidated) > > > > > + n_valid_logicalslots++; > > > > > > > > > > > > > > > This seems confusing to me. Can we instead do: > > > > > > > > > > SpinLockAcquire(&s->mutex); > > > > > if (s->data.invalidated == RS_INVAL_NONE) > > > > > n_valid_logicalslots++; > > > > > SpinLockRelease(&s->mutex); > > > > > > > > > > 2) > > > > > InvalidateObsoleteReplicationSlots: > > > > > > > > > > + bool islogical = SlotIsLogical(s); > > > > > > > > > > /* Prevent invalidation of logical slots during binary upgrade */ > > > > > if (SlotIsLogical(s) && IsBinaryUpgrade) > > > > > + { > > > > > + SpinLockAcquire(&s->mutex); > > > > > + if (s->data.invalidated == RS_INVAL_NONE) > > > > > + n_valid_logicalslots++; > > > > > + SpinLockRelease(&s->mutex); > > > > > + > > > > > continue; > > > > > + } > > > > > > > > > > We should use 'islogical' instead of SlotIsLogical here. > > > > > > > > > > 3) > > > > > InvalidateObsoleteReplicationSlots() is more robust now as we are > > > > > using both 'invalidated' and 'released_lock' flags but still nowhere > > > > > we guarantee that invalidated=true implies released_lock=true. Since > > > > > we jump to 'restart' label only if released_lock is true, it becomes > > > > > important to have an ASSERT which says invalidated=true implicitly > > > > > means released_lock=true or vice versa. Because at the end we go by > > > > > 'invalidated_logical' rather than 'released_lock' to decide about > > > > > logical-decoding disabling. > > > > > > > > > > In this logic: > > > > > > > > > > + if (InvalidatePossiblyObsoleteSlot(possible_causes, s, oldestLSN, > > > > > + dboid, snapshotConflictHorizon, > > > > > + &released_lock)) > > > > > { > > > > > - /* if the lock was released, start from scratch */ > > > > > - goto restart; > > > > > + /* Remember we have invalidated a physical or logical slot */ > > > > > + invalidated = true; > > > > > + > > > > > + /* > > > > > + * Additionally, remember we have invalidated a logical slot too > > > > > + * as we can request disabling logical decoding later. > > > > > + */ > > > > > + if (islogical) > > > > > + invalidated_logical = true; > > > > > } > > > > > > > > > > Shall we have an Assert(released_lock) if > > > > > InvalidatePossiblyObsoleteSlot returns true. Or any better way? > > > > > > > > > > 4) > > > > > + SpinLockAcquire(&s->mutex); > > > > > + if (s->data.invalidated == RS_INVAL_NONE) > > > > > + n_valid_logicalslots++; > > > > > > > > > > In the same function, isn't the above code problematic: Don't we need > > > > > 'islogical' check before incrementing 'n_valid_logicalslots', > > > > > otherwise it may wrongly count valid physical slots as well. > > > > > > > > Agreed with all the above points. Will fix and update the updated > > > > version. > > > > > > > > > > I've attached the updated version patch. I addressed all comments I > > > got so far, and made some cosmetic changes. > > > > > > > Thanks. A few comments: > > > > 1) > > Shall we update comments atop InvalidateObsoleteReplicationSlots() as > > well, similar to other functions. Something like: > > > > If it invalidates the last logical slot in the cluster, it requests to > > disable logical decoding. > > Okay, added. > > > > > 2) > > With the new sanity check (Assert(released_lock)) in > > InvalidateObsoleteReplicationSlots, we have made sure that whenever a > > slot is invalidated, we do release-lock. But we have not made sure > > that released_lock=true always implies a slot is invalidated. Looking > > at InvalidatePossiblyObsoleteSlot(), that seems to be the case always, > > but shall we have a sanity check in for this as well. Thoughts? > > > > I think it's possible that InvalidatePossiblyObsoleteSlot() releases > the slot but doesn't invalidate it. For example, after it terminates > the process owning the slot, the slot gets dropped or its restart_lsn > (or xmin) gets advanced enough not to be invalidated.
Oh, if released_lock can be true while the slot isn’t actually invalidated, could we end up resetting 'n_valid_logicalslots' to 0 when we shouldn’t? Or I guess even if that happens, we’re still fine because the loop restarts and we’ll hit that slot again — and on the next pass we’ll either invalidate it or not release the lock. Is that right? I’m just trying to make sure we don’t end up in a situation where we miss counting a valid slot. And how do we ensure that through any sanity checks? thanks Shveta
