On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent <boekewurm+postg...@gmail.com> wrote: > Could you elaborate on what this "matches what we expect" entails? > > Apart from this, I'm also quite certain that the goto-branch that > created this infinite loop should have been dead code: In a correctly > working system, the GlobalVis*Rels should always be at least as strict > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels > can be updated (i.e. move their horizon forward) during the vacuum. As > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple > that also satisifies the condition of HeapTupleSatisfiesVacuum.
It's true that these two similar functions should be in perfect agreement in general (given the same OldestXmin). That in itself doesn't mean that they must always agree about a tuple in practice, when they're called in turn inside lazy_scan_prune(). In particular, nothing stops a transaction that was in progress to heap_prune_satisfies_vacuum (when it saw some tuples it inserted) concurrently aborting. That will render the same tuples fully DEAD inside HeapTupleSatisfiesVacuum(). So we need to restart using the goto purely to cover that case. See the commit message of commit 8523492d4e3. By "matches what we expect", I meant "involves a just-aborted transaction". We could defensively verify that the inserting transaction concurrently aborted at the point of retrying/calling heap_page_prune() a second time. If there is no aborted transaction involved (as was the case with this bug), then we can be confident that something is seriously broken. -- Peter Geoghegan