On Mon, Feb 16, 2026 at 4:35 PM Amit Kapila <[email protected]> wrote:
>
> On Fri, Feb 13, 2026 at 7:54 AM Zhijie Hou (Fujitsu)
> <[email protected]> wrote:
> >
> > Thanks for pushing! Here are the remaining patches.
> >
>
> One thing that bothers me about the remaining patch is that it could
> lead to infinite re-tires in the worst case. For example, in first
> try, slot-1 is not synced say due to physical replication delays in
> flushing WALs up to the confirmed_flush_lsn of that slot, then in next
> (re-)try, the same thing happened for slot-2, then in next (re-)try,
> slot-3 appears to invalidated on standby but it is valid on primary,
> and so on. What do you think?

Yes, that is a possibility we cannot rule out. This can also happen
during the first invocation of the API (even without the new changes)
when we attempt to create new slots, they may remain in a temporary
state indefinitely. However, that risk is limited to the initial sync,
until the slots are persisted, which is somewhat expected behavior.
With the current changes though, the possibility of an indefinite wait
exists during every run. So the question becomes: what would be more
desirable for users -- for the API to finish with the risk that a few
slots are not synced, or for the API to wait longer to ensure that all
slots are properly synced?

I think that if the primary use case of this API is when a user plans
to run it before a scheduled failover, then it would be better for the
API to wait and ensure everything is properly synced. But I am not
very very sure on the use case though. What do you think?

> Independent of whether we consider the entire patch, the following bit
> in the patch in useful as we retry to sync the slots via API.
> @@ -218,7 +219,7 @@ update_local_synced_slot(RemoteSlot *remote_slot,
> Oid remote_dbid)
>   * Can get here only if GUC 'synchronized_standby_slots' on the
>   * primary server was not configured correctly.
>   */
> - ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
> + ereport(LOG,
>   errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>   errmsg("skipping slot synchronization because the received slot sync"
>      " LSN %X/%08X for slot \"%s\" is ahead of the standby position %X/%08X",
>

yes. I agree.

thanks
Shveta


Reply via email to