Peter Geoghegan <p...@bowt.ie> writes: > On Wed, Apr 13, 2022 at 4:13 PM Andres Freund <and...@anarazel.de> wrote: >> IIRC the problem in matter isn't skipped pages, but that the horizon simply >> isn't new enough to mark pages as all visible.
> Sometimes OldestXmin can go backwards in VACUUM operations that are > run in close succession against the same table, due to activity from > other databases in the same cluster (perhaps other factors are > involved at times). I've been doing some testing locally by inserting commands to manually set tenk1's relallvisible to zero. I first did that in test_setup.sql ... and it had no effect whatsoever. Further experimentation showed that the "CREATE INDEX ON tenk1" steps in create_index.sql itself generally suffice to fix relallvisible; although if you force it back to zero after the last such command, you get the same plan diffs wrasse is showing. And you don't get any others, which I thought curious until I realized that sanity_check.sql's database-wide VACUUM offers yet another opportunity to heal the incorrect value. If you force it back to zero again after that, a bunch of later tests start to show plan differences, which is what I'd been expecting. So what seems to be happening on wrasse is that a background autovacuum (or really autoanalyze?) is preventing pages from being marked all-visible not only during test_setup.sql but also create_index.sql; but it's gone by the time sanity_check.sql runs. Which is odd in itself because not that much time elapses between create_index and sanity_check, certainly less than the time from test_setup to create_index. It seems like a reliable fix might require test_setup to wait for any background autovac to exit before it does its own vacuums. Ick. And we still lack an explanation of why this only now broke. I remain suspicious that pgstats is behaving unexpectedly. regards, tom lane