Attached are two patches, both of which are fixes for bugs in nbtree VACUUM page deletion.
The first fix for a bug in commit 857f9c36cda. The immediate issue is that the code that maintains the oldest btpo.xact in the index accesses the special area of pages without holding a buffer pin. More fundamentally, the logic for examining pages that are deleted by the current VACUUM operation for the purposes of maintaining the oldest btpo.xact needs to be pushed down into the guts of the page deletion code. This has been described on other threads [1][2], but I'm starting a new one to focus attention on the bugs themselves (any discussion of Valgrind instrumentation can remain on the other thread). The second fix is a spin-off of the first. It fixes a much less serious issue that is present in all supported versions (it has probably been around forever). The issue is with undercounting in the "%u index pages have been deleted" figure reported by VACUUM VERBOSE. For example, suppose I delete most of the tuples from a table with a B-Tree index, leading to lots of empty pages that are candidates for deletion (the specifics really aren't very important). I then run VACUUM VERBOSE, and can see something like this: 4900 index pages have been deleted, 0 are currently reusable. If I immediately run another VACUUM VERBOSE, I can see this: 4924 index pages have been deleted, 4924 are currently reusable. It makes sense that the pages weren't reusable in the first VACUUM, but were in the second VACUUM. But where did the extra 24 pages come from? That doesn't make any sense at all. With the second patch applied, the same test case shows the correct number ("4924 index pages have been deleted") consistently. See the patch for details of how the accounting is wrong. (The short version is that we need to push the accounting down into the guts of page deletion, which is why the second patch relies on the first patch.) I would like to backpatch both patches to all branches that have commit 857f9c36cda -- v11, v12, and master. The second issue isn't serious, but it seems worth keeping v11+ in sync in this area. Note that any backpatch theoretically creates an ABI break for callers of the _bt_pagedel() function. Perhaps I could get around this by using global state in the back branches or something, but that's ugly as sin. Besides, I have a very hard time imagining an extension that feels entitled to call _bt_pagedel(). There are all kinds of ways in which that's a terrible idea. And it's hard to imagine how that could ever seem useful. I'd like to hear what other people think about this ABI issue in particular, though. [1] https://postgr.es/m/CAH2-Wz=mWPBLZ2cr95cBV=kzpav8oobtkhtfg+-+auiozan...@mail.gmail.com [2] https://postgr.es/m/CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpwrub_45eexlh1+k...@mail.gmail.com -- Peter Geoghegan
v1-0002-Fix-undercounting-in-VACUUM-VERBOSE-output.patch
Description: Binary data
v1-0001-Fix-bug-in-nbtree-VACUUM-skip-full-scan-feature.patch
Description: Binary data