Hi Hackers, Looking at the MultiXact truncation behavior and reading through the recent thread regarding the 17.8 standby crashing during WAL replay (commit 8ba61bc), we noticed an architectural edge case that seems to cause a silent primary/standby SLRU divergence. I'd like to ask if this is a known accepted risk or if a patch to reorder this logic is worth exploring.
The Issue: In TruncateMultiXact(), we write the truncation WAL record (WriteMTruncateXlogRec) before we actually perform the truncation via PerformOffsetsTruncation() -> SimpleLruTruncate(). The problem arises from the "apparent wraparound" safety check inside SimpleLruTruncate(). If SlruScanDirectory() detects an apparent wraparound, SimpleLruTruncate() safely bails out and skips unlinking the SLRU segments on the primary, logging: could not truncate directory "%s": apparent wraparound. However, the WAL record for the truncation has already been flushed. Standbys replay this TRUNCATE_ID WAL record and blindly delete their SLRU segments. At this point, the primary and standby have diverged. The Impact: If the standby is subsequently promoted to primary, any attempt to access rows holding those older MultiXact IDs (which the original primary decided to keep) will throw a FATAL: could not access status of transaction error, effectively resulting in data loss / inaccessible rows for the user. While the recent commits address the immediate standby crash involving latest_page_number during multixact_redo(), they don't seem to prevent the primary from emitting a "false" WAL truncation record when it abandons its own truncation. Proposed Approach: It seems safer to only emit the WAL record if we are guaranteed to follow through with the truncation. We could modify SimpleLruTruncate() to perform its safety checks first and return a boolean indicating whether the truncation is safe to proceed. TruncateMultiXact() would then only call WriteMTruncateXlogRec() and proceed with physical deletion if the check passes. I have attached a rough draft patch illustrating this sequence change. Is this a scenario the community has already considered, or is this reordering something that should be explored further to harden standby reliability? PS. Also, noticed this to be the case in clog.c file Thanks for your time. Regards, Ayush
v1-prevent-multixact-slru-divergence 2.patch
Description: Binary data
