Hi, On 2020-07-13 21:18:10 -0400, Robert Haas wrote: > On Mon, Jul 13, 2020 at 9:10 PM Andres Freund <and...@anarazel.de> wrote: > > > What if clog has been truncated so that the xmin can't be looked up? > > > > That's possible, but probably only in cases where xmin actually > > committed. > > Isn't that the normal case? I'm imagining something like: > > - Tuple gets inserted. Transaction commits. > - VACUUM processes table. > - Mischievous fairies mark page all-visible in the visibility map. > - VACUUM runs lots more times, relfrozenxid advances, but without ever > looking at the page in question, because it's all-visible. > - clog is truncated, rendering xmin no longer accessible. > - User runs VACUUM disabling page skipping, gets ERROR. > - User deletes offending tuple. > - At this point, I think the tuple is both invisible and unprunable? > - Fairies happy, user sad.
I'm not saying it's impossible that that happens, but the cases I did investigate didn't look like this. If something just roguely wrote to the VM I'd expect a lot more "is not marked all-visible but visibility map bit is set in relation" type WARNINGs, and I've not seen much of those (they're WARNINGs though, so maybe we wouldn't). Presumably this wouldn't always just happen with tuples that'd trigger an error first during hot pruning. I've definitely seen indications of both datfrozenxid and relfrozenxid getting corrupted (in particular vac_update_datfrozenxid being racy as hell), xid wraparound, indications of multixact problems (although it's possible we've now fixed those) and some signs of corrupted relcache entries for shared relations leading to vacuums being skipped. Greetings, Andres Freund