Hi, Thank you for the response.
On Tue, 17 Mar 2026 at 03:40, Heikki Linnakangas <[email protected]> wrote: > > Replaying the record will perform the same sanity checks against > wraparound as the primary does. > > Hmm, although why did I not apply commit 817f74600d to 'master', only > backbranches? The bug that it fixed was related to minor version > upgrade, and thus it was not needed on 'master', but the code change > would nevertheless make a lot of sense on 'master' too. > Agreed, once 817f74600d is on master the standby would honestly evaluate the SimpleLruTruncate wraparound backstop instead of bypassing it. However, the backstop is documented as catching "wraparound bugs elsewhere in SLRU handling." If such a bug corrupts latest_page_number on the primary, the standby — which derives its latest_page_number independently from ZERO_OFF_PAGE replay and StartupMultiXact() — would not share the same corruption. The primary would skip the truncation, but the standby would see a healthy latest_page_number and proceed. > Have you been able to reproduce that? > I have reproduced the primary-side condition on an unmodified tree using gdb in batch mode: attach to the VACUUM backend after WriteMTruncateXlogRec() returns, corrupt latest_page_number, and resume. The primary logs "apparent wraparound" and skips the physical deletion, while pg_waldump confirms the TRUNCATE_ID record is present in the WAL. I have not yet set up a streaming replica to demonstrate end-to-end divergence and promotion failure. > > I agree that would probably be better. I'm not sure how straightforward > it will be to implement though, I wouldn't want to add much extra code > just for this. > One approach that might keep the footprint small: we could inline the same PagePrecedes check that SimpleLruTruncate uses directly in TruncateMultiXact(), before START_CRIT_SECTION(). Something like: if (MultiXactOffsetCtl->PagePrecedes( pg_atomic_read_u64(&MultiXactOffsetCtl->shared->latest_page_number), MultiXactIdToOffsetPage(PreviousMultiXactId(newOldestMulti))) || MultiXactMemberCtl->PagePrecedes( pg_atomic_read_u64(&MultiXactMemberCtl->shared->latest_page_number), MXOffsetToMemberPage(newOldestOffset))) { ereport(LOG, (errmsg("skipping multixact truncation due to apparent wraparound"))); LWLockRelease(MultiXactTruncationLock); return; } No new functions, no changes to slru.c or the replay path — just the same condition evaluated earlier so we never enter the critical section or write WAL for a truncation that won't be carried out. Does this seem like a reasonable direction? Regards, Ayush
