On Sat, Mar 30, 2024 at 8:00 AM Robert Haas <robertmh...@gmail.com> wrote: > > On Sat, Mar 30, 2024 at 1:57 AM Melanie Plageman > <melanieplage...@gmail.com> wrote: > > I think that we are actually successfully removing more RECENTLY_DEAD > > HOT tuples than in master with heap_page_prune()'s new approach, and I > > think it is correct; but let me know if I am missing something. > > /me blinks. > > Isn't zero the only correct number of RECENTLY_DEAD tuples to remove?
At the top of the comment for heap_prune_chain() in master, it says * If the item is an index-referenced tuple (i.e. not a heap-only tuple), * the HOT chain is pruned by removing all DEAD tuples at the start of the HOT * chain. We also prune any RECENTLY_DEAD tuples preceding a DEAD tuple. * This is OK because a RECENTLY_DEAD tuple preceding a DEAD tuple is really * DEAD, our visibility test is just too coarse to detect it. Heikki had added a comment in one of his patches to the fast path for HOT tuples at the top of heap_prune_chain(): * Note that we might first arrive at a dead heap-only tuple * either while following a chain or here (in the fast path). Whichever path * gets there first will mark the tuple unused. * * Whether we arrive at the dead HOT tuple first here or while * following a chain above affects whether preceding RECENTLY_DEAD * tuples in the chain can be removed or not. Imagine that you * have a chain with two tuples: RECENTLY_DEAD -> DEAD. If we * reach the RECENTLY_DEAD tuple first, the chain-following logic * will find the DEAD tuple and conclude that both tuples are in * fact dead and can be removed. But if we reach the DEAD tuple * at the end of the chain first, when we reach the RECENTLY_DEAD * tuple later, we will not follow the chain because the DEAD * TUPLE is already 'marked', and will not remove the * RECENTLY_DEAD tuple. This is not a correctness issue, and the * RECENTLY_DEAD tuple will be removed by a later VACUUM. My patch splits the tuples into HOT and non-HOT while gathering their visibility information and first calls heap_prune_chain() on the non-HOT tuples and then processes the yet unmarked HOT tuples in a separate loop afterward. This will follow all of the chains and process them completely as well as processing all HOT tuples which may not be reachable from a valid chain. The fast path contains a special check to ensure that line pointers for DEAD not HOT-updated HOT tuples (dead orphaned tuples from aborted HOT updates) are still marked LP_UNUSED even though they are not reachable from a valid HOT chain. By doing this later, we don't preclude ourselves from following all chains. - Melanie