On Mon, Feb 9, 2026 at 11:36 AM Zhijie Hou (Fujitsu)
<[email protected]> wrote:
>
> On Monday, February 2, 2026 5:10 PM shveta malik <[email protected]>
> wrote:
> >
> > On Mon, Feb 2, 2026 at 12:56 PM Zhijie Hou (Fujitsu)
> > <[email protected]> wrote:
> > >
> > >
> > > Here are the updated patches.
> > >
> >
> > Thanks for the patch. Few trivial comments:
> >
> > 1)
> > + * and if the slots are not ready to be synced because of any of the
> > + reasons
> > + * mentioned above, then the SQL function also waits and retries
> > until the slots
> > + * are synchronized to the latest information. Refer to the comments
> > + * in SyncReplicationSlots() for more details.
> >
> > We can make it slightly more clear by mentioning that it waits only for the
> > slots which existed at the start of function:
> >
> > "...then the SQL function also waits and retries until the failover slots
> > that
> > existed on primary at the start of the function call are synchronized."
>
> Improved.
>
> >
> > 2)
> >
> > We have below comment atop SyncReplicationSlots:
> >
> > * Repeatedly fetches and updates replication slot information from the
> > * primary until all slots are at least "sync ready".
> >
> > We shall change this too, as now we are not only waiting for them to be
> > sync-
> > ready. Even if they are sync-ready, this function can still wait in
> > subsequent
> > runs for different reasons.
>
> Right, fixed.
>
> >
> > 3)
> >
> > Existing test in test file:
> > ##################################################
> > # Test that pg_sync_replication_slots() on the standby skips and retries #
> > until
> > the slot becomes sync-ready (when the remote slot catches up with # the
> > locally reserved position).
> > # Also verify that slotsync skip statistics are correctly updated when the #
> > slotsync operation is skipped.
> > ##################################################
> >
> > New one added says:
> > +##################################################
> > +# Test that when physical replication lags behind logical replication,
> > +# pg_sync_replication_slots() on the standby skips and retries until
> > +physical # replication catches up. Also verify that slotsync skip
> > +statistics are # correctly updated when the slotsync operation is skipped.
> > +##################################################
> >
> > The "need" of this new test case isn't very clear provided we already have
> > testcase1. Perhaps we could revise the comment to something like:
> >
> > "Test that even for a sync-ready slot, when physical replication lags behind
> > logical replication, pg_sync_replication_slots() on the standby
> > skips........"
>
> Adjusted the comments as suggested.
>
> In addition to addressing the comments, I revisited the recently updated
> slotsync code and noticed opportunities to simplify some parameters, checks,
> and
> codes. This will also facilitate the improvement in v2-0001 coding.
>
> * Previously, certain function parameters(found_consistent_snapshot,
> remote_slot_precedes of update_local_synced_slot()) were used to store the
> reason for slot synchronization being skipped. However, now that a slot
> property
> serves this purpose, we can simplify the code by eliminating those redundant
> parameters and using the slot's property to perform the same check.
>
> * The slot synchronization is skipped if the required WAL has not been
> received
> and flushed. Previously, this check[1] was performed in two separate code
> paths.
> Such duplication can lead to coding errors if changes are made in one location
> without updating the other, as exemplified by the issue fixed in commit
> 3df4df5.
> This commit consolidates the check into a single location to eliminate
> redundancies and reduce the potential for future errors.
>
> To address these points, I have created two additional patches: V3-0001 for
> the
> first point and V3-0002 for the second. V3-0003 contains the current
> improvement
> being discussed, and it's also simplified thanks to the preceding patches.
>
>
> [1]
> /*
> * ...
> */
> if (remote_slot->confirmed_lsn > latestFlushPtr)
> {
> update_slotsync_skip_stats(SS_SKIP_WAL_NOT_FLUSHED);
>
> /*
> * Can get here only if GUC
> 'synchronized_standby_slots' on the
> * primary server was not configured correctly.
> */
> ereport(AmLogicalSlotSyncWorkerProcess() ? LOG :
> ERROR,
> ...));
> }
>
>
I like the idea of both the new patches. Please find a few trivial comments:
patch002:
1)
Earlier at both the places where we were updating
'SS_SKIP_WAL_NOT_FLUSHED', we were returning slot_updated as false,
now, we might end up returning it as true (specially at second
occurrence). Is this intentional?
2)
In update_and_persist_local_synced_slot(), we can reach this even when
wal_not_flushed, so we shall to update comment:
if (slot->slotsync_skip_reason != SS_SKIP_NONE)
{
/*
* We reach here when the remote slot didn't catch up to locally
* reserved position, or it cannot reach the
consistent point from the
* restart_lsn.
....
*/
Patch003:
3)
+ if (slotsync_pending && slot->slotsync_skip_reason != SS_SKIP_NONE)
+ *slotsync_pending = true;
Here shall we ensure by a sanity check that slotsync_skip_reason !=
SS_SKIP_INVALID? And please bring back the comment as well, which was
there in an earlier patch which stated the reason for not including
'SS_SKIP_INVALID' here.
thanks
Shveta