Hi, On Sun, Sep 28, 2025 at 9:47 PM Xuneng Zhou <[email protected]> wrote: > > Hi, > > On Thu, Aug 28, 2025 at 4:22 PM Xuneng Zhou <[email protected]> wrote: > > > > Hi, > > > > Some changes in v3: > > 1) Update the note of xlogwait.c to reflect the extending of its use > > for flush waiting and internal use for both flush and replay waiting. > > 2) Update the comment above logical_read_xlog_page which describes the > > prior-change behavior of read_local_xlog_page. > > In an off-list discussion, Alexander pointed out potential issues with > the current single-heap design for replay and flush when promotion > occurs concurrently with WAIT FOR. The following is a simple example > illustrating the problem: > > During promotion, there's a window where we can have mixed waiter > types in the same heap: > > T1: Process A calls read_local_xlog_page_guts on standby > T2: RecoveryInProgress() = TRUE, adds to heap as replay waiter > T3: Promotion begins > T4: EndRecovery() calls WaitLSNWakeup(InvalidXLogRecPtr) > T5: SharedRecoveryState = RECOVERY_STATE_DONE > T6: Process B calls read_local_xlog_page_guts > T7: RecoveryInProgress() = FALSE, adds to SAME heap as flush waiter > > The problem is that replay LSNs and flush LSNs represent different > positions in the WAL stream. Having both types in the same heap can > lead to: > - Incorrect wakeup logic (comparing incomparable LSNs) > - Processes waiting forever > - Wrong waiters being woken up > > To avoid this problem, patch v4 is updated to utilize two separate > heaps for flush and replay like Alexander suggested earlier. It also > introduces a new separate min LSN tracking field for flushing. >
v5-0002 separates the waitlsn_cmp() comparator function into two distinct functions (waitlsn_replay_cmp and waitlsn_flush_cmp) for the replay and flush heaps, respectively. Best, Xuneng
v5-0000-cover-letter.patch
Description: Binary data
v11-0001-Implement-WAIT-FOR-command.patch
Description: Binary data
v5-0002-Improve-read_local_xlog_page_guts-by-replacing-po.patch
Description: Binary data
