On Thu, Dec 4, 2025 at 2:04 AM Masahiko Sawada <[email protected]> wrote:
>
> On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > I think the invalidation cannot occur when copying because:
> >
> > Currently, there are no CHECK_FOR_INTERRUPTS() calls between the initial
> > restart_lsn copy (first phase) and the latest restart_lsn copy (second 
> > phase).
> > As a result, even if a checkpoint attempts to invalidate a slot and sends a
> > SIGTERM to the backend, the backend will first update the restart_lsn 
> > during the
> > second phase before responding to the signal. Consequently, during the next
> > cycle of InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the
> > updated restart_lsn and skip the invalidation.
> >
> > For logical slots, although invoking the output plugin startup callback 
> > presents
> > a slight chance of processing the signal (when using third-party plugins), 
> > slot
> > invalidation in this scenario results in immediate slot dropping, because 
> > the
> > slot is in RS_EPHEMERAL state, thus preventing invalidation.
>
> Thank you for the analysis. I agree.
>
> > While theoretically, slot invalidation could occur if the code changes in 
> > the
> > future, addressing that possibility could be considered an independent
> > improvement task. What do you think ?
>
> Okay. I find that it also might make sense for HEAD to use
> RS_EPHEMERAL state for physical slots too to avoid being invalidated
> during creation, which probably can be discussed later. For back
> branches, the proposed idea of acquiring ReplicationSlotAllocationLock
> in an exclusive mode would be better. I think we might want to have a
> comment in CheckPointReplicationSlots() too that refers to
> ReplicationSlotReserveWal().
>
> Regarding whether we revert the original fix 2090edc6f32 and make it
> the same as we did in HEAD ca307d5cec90a4f, we need to change the size
> of ReplicationSlot struct. I'm concerned that it's really safe to
> change it because the data resides on the shared memory. For example,
> we typically iterate over all replication slots as follow:
>
> for (i = 0; i < max_replication_slots; i++)
> {
>     ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
>
> I'm concerned that the arithmetic for calculating the slot address is
> affected by the size of ReplicationSlot change.
>

Yes, this is a valid concern. I think we can go-ahead with fixing the
0001's-fix in HEAD and 18. We can discuss separately the fix for
back-branches prior to 18.

-- 
With Regards,
Amit Kapila.


Reply via email to