On Fri, Apr 10, 2026 at 8:28 AM Zhijie Hou (Fujitsu) <[email protected]> wrote: > > On Sunday, March 22, 2026 1:47 AM Amit Kapila <[email protected]> wrote: > > On Thu, Mar 19, 2026 at 7:08 PM Fujii Masao <[email protected]> > > wrote: > > > > > > On Sat, Feb 28, 2026 at 2:03 PM Amit Kapila <[email protected]> > > wrote: > > > > > > > > On Fri, Feb 27, 2026 at 8:34 PM Fujii Masao <[email protected]> > > wrote: > > > > > > > > > > Normally, the slotsync worker updates the standby slot using the > > > > > primary's slot state. However, when confirmed_flush_lsn matches > > > > > but restart_lsn does not, the worker does not actually update the > > > > > standby slot. Despite that, the current code of > > > > > update_local_synced_slot() appears to treat this situation as if > > > > > an update occurred. As a result, the worker sleeps only for the > > > > > minimum interval (200 ms) before retrying. In the next cycle, it > > > > > again assumes an update happened, and continues looping with the > > > > > short sleep interval, causing the repeated logical decoding log > > > > > messages. > > Based on a quick analysis, this seems to be the root cause. > > > > > > > > > > I think update_local_synced_slot() should return false (i.e., no > > > > > update > > > > > happened) when confirmed_flush_lsn is equal but restart_lsn > > > > > differs between primary and standby. > > > > > > > > > > > > > We expect that in such a case update_local_synced_slot() should > > > > advance local_slot's 'restart_lsn' via > > > > LogicalSlotAdvanceAndCheckSnapState(), otherwise, it won't go in the > > > > cheap code path next time. Normally, restart_lsn advancement should > > > > happen when we process XLOG_RUNNING_XACTS and call > > > > SnapBuildProcessRunningXacts(). In this particular case as both > > > > restart_lsn and confirmed_flush_lsn are the same (0/03000140), the > > > > machinery may not be processing XLOG_RUNNING_XACTS record. I have > > > > not debugged the exact case yet but you can try by emitting some > > > > more records on publisher, it should let the standby advance the > > > > slot. It is possible that we can do something like you are proposing > > > > to silence the LOG messages but we should know what is going on here. > > > > > > I agree it's important to investigate why restart_lsn doesn't advance > > > in that case and fix that issue. > > > > > > Separately, I think mistreating the slot as updated is another problem: > > > the local slot is treated as updated even when it isn't, which causes > > > the slotsync worker to use the minimum sleep interval. We should > > > address this independently. > > > > > > LogicalSlotAdvanceAndCheckSnapState() doesn't always update the local > > > slot, but update_local_synced_slot() currently assumes it does. That > > > seems to be the root cause. The attached 0001 patch makes > > > LogicalSlotAdvanceAndCheckSnapState() return whether the slot was > > > actually updated, and updates update_local_synced_slot() to act based on > > that result. > > > > > > I've also attached 0002, which reduces the log level of some logical > > > decoding messages (e.g., "logical decoding found consistent point") > > > from LOG to DEBUG1. These are low-level, developer-oriented messages > > > and can be noisy when they occur regularly. This is the same issue I > > reported earlier at [1]. > > > > > > How about applying these patches while we continue working on the > > > restart_lsn issue? > > > > > > > * > > @@ -2174,7 +2193,10 @@ > > LogicalSlotAdvanceAndCheckSnapState(XLogRecPtr moveto, > > > > if (XLogRecPtrIsValid(ctx->reader->EndRecPtr)) > > { > > - LogicalConfirmReceivedLocation(moveto); > > + bool slot_updated = LogicalConfirmReceivedLocation(moveto); > > + > > + if (updated_xmin_or_lsn) > > + *updated_xmin_or_lsn = slot_updated; > > > > BTW, LogicalSlotAdvanceAndCheckSnapState() could also advance slot while > > processing running_xact record, so not sure we can rely only on the explicit > > call LogicalConfirmReceivedLocation() above to ascertain the same. > > Perhaps we could simply compare the slot's old and new LSN/xmin to determine > whether updated_xmin_or_lsn needs to be set. This would avoid the need to hook > into each individual update path at the lower level. Attaching a patch for > reference. > > > > > Sorry, I couldn't get the chance to look at the patches proposed by Hou-san > > to > > fix this issue but I'll look at it after the feature freeze. > > The other patches attempt to advance restart_lsn in more cases by modifying > the > logical decoding logic. However, those scenarios are relatively rare and may > not > produce significant improvement outside the slot synchronization context. So, > I > think modifying the slot sync worker to correctly increase the polling > interval instead > is also a good starting point. >
After reviewing the entire thread and patches, the above solution appears to be the simplest and least intrusive, especially since the issue is limited to the sltosync worker. So +1 for this idea. thanks Shveta
