Hi, I noticed while writing another patch that I think there might be an issue/oversight with how the snapshot conflict horizon is calculated for the prune/freeze WAL record in master/17 and the freeze WAL record in 16.
In code to determine whether or not to freeze tuples on the page in phase I of vacuum, we ignore LP_DEAD items temporarily, assuming they will be set LP_UNUSED and removed by phase III of vacuum. This means that the local values of all_visible and all_frozen (whether or not the page is all-visible/all-frozen) in the PruneState may temporarily be true when there are LP_DEAD items. Before returning to code that actually updates the visibility map, all_visible and all_frozen are unset if there are lpdead items. However, we calculate the snapshot conflict horizon for the prune/freeze record in master/17 and the freeze record in 16 before updating all-frozen and all-visible. That means the horizon may be too aggressive if the page cannot actually be set all-frozen. This isn't a correctness issue, but it may lead to transactions on the standby being unnecessarily canceled for pages which have some tuples frozen but not all due to remaining LP_DEAD items. I'm not 100% certain of my analysis given that this is a complicated area of the code. I've attached suggested fixes for master/17 and 16 and am interested in what others think. - Melanie
From fee539ddc97f1d58a883d37fcecea94af1261e8d Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Thu, 29 May 2025 10:20:18 -0400 Subject: [PATCH] Calculate prune/freeze conflict horizon with final all_frozen value In phase I of heap vacuuming, while determining whether or not to freeze tuples on a page, heap_page_prune_and_freeze() ignores LP_DEAD items which it assumes will be set LP_UNUSED and removed by the phase III of vacuum. That means the local value of prstate.all_frozen it uses may be true even when dead items are present. In this case, however, before returning control to lazy_scan_prune() heap_page_prune_and_freeze() must unset prstate.all_frozen to avoid lazy_scan_prune() incorrectly updating the visibility map for that heap page. The problem was that heap_page_prune_and_freeze() calculated the snapshot conflict horizon for the prune and freeze record using the contingent value of prstate.all_frozen. That could result in an overly aggressive snapshot conflict horizon in the case that some tuples were frozen but the whole page was not able to be set all-frozen. This would not affect correctness -- only potentially cancel queries unnecessarily on the standby. Unset this sooner to correctly set the snapshot conflict horizon in the prune/freeze record. This issue is present in at least Postgres 17 and 16. Further investigation is needed for earlier versions. --- src/backend/access/heap/pruneheap.c | 45 ++++++++++++++--------------- 1 file changed, 21 insertions(+), 24 deletions(-) diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c index a8025889be0..14744dcd2d5 100644 --- a/src/backend/access/heap/pruneheap.c +++ b/src/backend/access/heap/pruneheap.c @@ -750,6 +750,25 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer, */ } + /* + * It was convenient to ignore LP_DEAD items in all_visible earlier on to + * make the choice of whether or not to freeze the page unaffected by the + * short-term presence of LP_DEAD items. These LP_DEAD items were + * effectively assumed to be LP_UNUSED items in the making. It doesn't + * matter which vacuum heap pass (initial pass or final pass) ends up + * setting the page all-frozen, as long as the ongoing VACUUM does it. + * + * Now that freezing has been finalized, unset all_visible if there are + * any LP_DEAD items on the page. It needs to reflect the present state + * of the page, both for calculating the snapshot conflict horizon for + * this record and to communicate back to the caller. + */ + if (prstate.lpdead_items > 0) + { + prstate.all_visible = false; + prstate.all_frozen = false; + } + /* Any error while applying the changes is critical */ START_CRIT_SECTION(); @@ -852,30 +871,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer, presult->nfrozen = prstate.nfrozen; presult->live_tuples = prstate.live_tuples; presult->recently_dead_tuples = prstate.recently_dead_tuples; - - /* - * It was convenient to ignore LP_DEAD items in all_visible earlier on to - * make the choice of whether or not to freeze the page unaffected by the - * short-term presence of LP_DEAD items. These LP_DEAD items were - * effectively assumed to be LP_UNUSED items in the making. It doesn't - * matter which vacuum heap pass (initial pass or final pass) ends up - * setting the page all-frozen, as long as the ongoing VACUUM does it. - * - * Now that freezing has been finalized, unset all_visible if there are - * any LP_DEAD items on the page. It needs to reflect the present state - * of the page, as expected by our caller. - */ - if (prstate.all_visible && prstate.lpdead_items == 0) - { - presult->all_visible = prstate.all_visible; - presult->all_frozen = prstate.all_frozen; - } - else - { - presult->all_visible = false; - presult->all_frozen = false; - } - + presult->all_visible = prstate.all_visible; + presult->all_frozen = prstate.all_frozen; presult->hastup = prstate.hastup; /* -- 2.34.1
From 8b527ff196b8752809b92f7073bb7624d2329170 Mon Sep 17 00:00:00 2001 From: Melanie Plageman <melanieplage...@gmail.com> Date: Thu, 29 May 2025 10:39:35 -0400 Subject: [PATCH] Calculate freeze conflict horizon with final all_frozen value In phase I of heap vacuuming, while determining whether or not to freeze tuples on a page, lazy_scan_prune() ignores LP_DEAD items which it assumes will be set LP_UNUSED and removed by the phase III of vacuum. That means the local value of prunestate->all_frozen it uses may be true even when dead items are present. In this case, however, before returning control to lazy_scan_heap() lazy_scan_prune() must unset prunsteate->all_visible to avoid lazy_scan_haep() incorrectly updating the visibility map for that heap page. The problem was that lazy_scan_prune() calculated the snapshot conflict horizon for the freeze record using the contingent value of prunestate->all_frozen. That could result in an overly aggressive snapshot conflict horizon in the case that some tuples were frozen but the whole page was not able to be set all-frozen. This would not affect correctness -- only potentially cancel queries unnecessarily on the standby. Unset this sooner to correctly set the snapshot conflict horizon in the freeze record. This is the minimal change to achieve this affect in backbranches, however, the value of all_frozen is not corrected before returning to lazy_scan_prune(). That doesn't affect correctness since it is not used without all_visible. It seemed wiser to do the smallest possible fix. --- src/backend/access/heap/vacuumlazy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index 9fa88960ada..8c3605eff1f 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -1833,7 +1833,7 @@ retry: * once we're done with it. Otherwise we generate a conservative * cutoff by stepping back from OldestXmin. */ - if (prunestate->all_visible && prunestate->all_frozen) + if (prunestate->all_visible && prunestate->all_frozen && lpdead_items == 0) { /* Using same cutoff when setting VM is now unnecessary */ snapshotConflictHorizon = prunestate->visibility_cutoff_xid; -- 2.34.1