Hi Bertrand, On Sat, Jun 6, 2026 at 7:07 PM Bertrand Drouvot <[email protected]> wrote: > > Hi Alexander, > > On Sat, Jun 06, 2026 at 12:00:00PM +0300, Alexander Lakhin wrote: > > Hello hackers, > > > > That is, walsender requested WAL segment for timeline 1, while in a > > successful run, it reads WAL for timeline 2. > > > > I've managed to reproduce this failure with: > > Thanks for the report and the repro! > > > As far as I can see, the timeline is chosen in logical_read_xlog_page() > > depending on the recovery state: > > am_cascading_walsender = RecoveryInProgress(); > > > > if (am_cascading_walsender) > > GetXLogReplayRecPtr(&currTLI); > > else > > currTLI = GetWALInsertionTimeLine(); > > Yeah, it looks like there is a race condition here. I think we should check if > the insertion timeline has already been set (like the walsummarizer is doing). > > I'll work on a fix early next week.
This looks like the right direction to fix. We may want to apply similar logic to read_local_xlog_page_guts as well. Although the failure is reported in walsender, SQL logical decoding uses the local WAL reader and has the same recovery/TLI pattern. -- Regards, Xuneng Zhou HighGo Software Co., Ltd.
