On Thu, Dec 4, 2025 at 2:04 AM Masahiko Sawada <[email protected]> wrote: > > On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu) > <[email protected]> wrote: > > > > I think the invalidation cannot occur when copying because: > > > > Currently, there are no CHECK_FOR_INTERRUPTS() calls between the initial > > restart_lsn copy (first phase) and the latest restart_lsn copy (second > > phase). > > As a result, even if a checkpoint attempts to invalidate a slot and sends a > > SIGTERM to the backend, the backend will first update the restart_lsn > > during the > > second phase before responding to the signal. Consequently, during the next > > cycle of InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the > > updated restart_lsn and skip the invalidation. > > > > For logical slots, although invoking the output plugin startup callback > > presents > > a slight chance of processing the signal (when using third-party plugins), > > slot > > invalidation in this scenario results in immediate slot dropping, because > > the > > slot is in RS_EPHEMERAL state, thus preventing invalidation. > > Thank you for the analysis. I agree. > > > While theoretically, slot invalidation could occur if the code changes in > > the > > future, addressing that possibility could be considered an independent > > improvement task. What do you think ? > > Okay. I find that it also might make sense for HEAD to use > RS_EPHEMERAL state for physical slots too to avoid being invalidated > during creation, which probably can be discussed later. For back > branches, the proposed idea of acquiring ReplicationSlotAllocationLock > in an exclusive mode would be better. I think we might want to have a > comment in CheckPointReplicationSlots() too that refers to > ReplicationSlotReserveWal(). > > Regarding whether we revert the original fix 2090edc6f32 and make it > the same as we did in HEAD ca307d5cec90a4f, we need to change the size > of ReplicationSlot struct. I'm concerned that it's really safe to > change it because the data resides on the shared memory. For example, > we typically iterate over all replication slots as follow: > > for (i = 0; i < max_replication_slots; i++) > { > ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i]; > > I'm concerned that the arithmetic for calculating the slot address is > affected by the size of ReplicationSlot change. >
Yes, this is a valid concern. I think we can go-ahead with fixing the 0001's-fix in HEAD and 18. We can discuss separately the fix for back-branches prior to 18. -- With Regards, Amit Kapila.
