On Tue, Nov 30, 2021 at 5:09 PM Peter Geoghegan <p...@bowt.ie> wrote: > I believe that there have been several historic reasons why we need a > cleanup lock during nbtree VACUUM, and that there is only one > remaining reason for it today. So the history is unusually complicated.
Minor correction: we actually also have to worry about plain index scans that don't use an MVCC snapshot, which is possible within nbtree. It's quite likely when using logical replication, actually. See the patch for more. Like with the index-only scan case, a non-MVCC snapshot + plain nbtree index scan cannot rely on heap access within the index scan node -- it won't reliably notice that any newer heap tuples (that are really the result of concurrent TID recycling) are not actually visible to its MVCC snapshot -- because there isn't an MVCC snapshot. The only difference in the index-only scan scenario is that we use the visibility map (not the heap) -- which is racey in a way that makes our MVCC snapshot (IOSs always have an MVCC snapshot) an ineffective protection. In summary, to be safe against confusion from concurrent TID recycling during index/index-only scans, we can do either of the following things: 1. Hold a pin of our leaf page while accessing the heap -- that'll definitely conflict with the cleanup lock that nbtree VACUUM will inevitably try to acquire on our leaf page. OR: 2. Hold an MVCC snapshot, AND do an actual heap page access during the plain index scan -- do both together. With approach 2, our plain index scan must determine visibility using real XIDs (against something like a dirty snapshot), rather than using a visibility map bit. That is also necessary because the VM might become invalid or ambiguous, in a way that's clearly not possible when looking at full heap tuple headers with XIDs -- concurrent recycling becomes safe if we know that we'll reliably notice it and not give wrong answers. Does that make sense? -- Peter Geoghegan