On 15/03/2024 02:56, Melanie Plageman wrote:
Okay, so I was going to start using xl_heap_prune for vacuum here too,
but I realized it would be bigger because of the
snapshotConflictHorizon. Do you think there is a non-terrible way to
make the snapshotConflictHorizon optional? Like with a flag?

Yeah, another flag would do the trick.

I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.

I started doing this, but I can't find a way of laying out the diagram
that pgindent doesn't destroy. I thought I remember other diagrams in
the source code showing the layout of something (something with pages
somewhere?) that don't get messed up by pgindent, but I couldn't find
them.

See src/tools/pgindent/README, section "Cleaning up in case of failure or ugly output":

    /*----------
     * Text here will not be touched by pgindent.
     */

There is a bit of duplicated code between heap_xlog_prune() and
heap2_desc() since they both need to deserialize the record. Before the
code to do this was small and it didn't matter, but it might be worth
refactoring it that way now.

I have added a helper function to do the deserialization,
heap_xlog_deserialize_prune_and_freeze(). But I didn't start using it in
heap2_desc() because of the way the pg_waldump build file works. Do you
think the helper belongs in any of waldump's existing sources?

        pg_waldump_sources = files(
                'compat.c',
                'pg_waldump.c',
                'rmgrdesc.c',
        )

        pg_waldump_sources += rmgr_desc_sources
        pg_waldump_sources += xlogreader_sources
        pg_waldump_sources += files('../../backend/access/transam/xlogstats.c')

Otherwise, I assume I am suppose to avoid adding some big new include to
waldump.

Didn't look closely at that, but there's some precedent with commit/prepare/abort records. See ParseCommitRecord, xl_xact_commit, xl_parsed_commit et al.

Note that we don't provide WAL compatibility across major versions. You can fully remove the old xl_heap_freeze_page format. (We should bump XLOG_PAGE_MAGIC when this is committed though)

On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?

I found another approach and just do the pre-freeze checks if we are
considering freezing except for the FPI criteria.

Hmm, I think you can make this simpler if you use XLogCheckBufferNeedsBackup() to check if the hint-bit update will generate a full-page-image before entering the critical section. Like you did to check if pruning will generate a full-page-image. I included that change in the attached patches.

I don't fully understand this:

        /*
         * If we will freeze tuples on the page or they were all already frozen
         * on the page, if we will set the page all-frozen in the visibility 
map,
         * we can advance relfrozenxid and relminmxid to the values in
         * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
         */
        if (presult->all_frozen || presult->nfrozen > 0)
        {
                presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
                presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
        }
        else
        {
                presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
                presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
        }

Firstly, the comment is outdated, because we have already done the freezing at this point. But more importantly, I don't understand the logic here. Could we check just presult->nfrozen > 0 here and get the same result?

I think it is probably worse to add both of them as additional optional
arguments, so I've just left lazy_scan_prune() with the job of
initializing them.

Ok.


Here are some patches on top of your patches for some further refactorings. Some notable changes in heap_page_prune_and_freeze():

- I moved the heap_prepare_freeze_tuple() work from the 2nd loop to the 1st one. It seems more clear and efficient that way.

- I extracted the code to generate the WAL record to a separate function.

- Refactored the "will setting hint bit generate FPI" check as discussed above

These patches are in a very rough WIP state, but I wanted to share early. I haven't done much testing, and I'm not wedded to these changes, but I think they make it more readable. Please include / squash in the patch set if you agree with them.

Please also take a look at the comments I marked with HEIKKI or FIXME, in the patches and commit messages.

I'll wait for a new version from you before reviewing more.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 622620a7875ae8c1626e9cd118156e0c734d44ed Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Sun, 17 Mar 2024 22:52:28 +0200
Subject: [PATCH v3heikki 1/4] Inline heap_frz_conflict_horizon() to the
 caller.

FIXME: This frz_conflict_horizon business looks fishy to me. We have:
- local frz_conflict_horizon variable,
- presult->frz_conflict_horizon, and
- prstate.snapshotConflictHorizon

should we really have all three, and what are the differences?
---
 src/backend/access/heap/pruneheap.c  | 17 ++++++++++++++---
 src/backend/access/heap/vacuumlazy.c | 27 ---------------------------
 src/include/access/heapam.h          |  2 --
 3 files changed, 14 insertions(+), 32 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index d3643b1ecc6..f4f5468e144 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -21,6 +21,7 @@
 #include "access/transam.h"
 #include "access/xlog.h"
 #include "access/xloginsert.h"
+#include "commands/vacuum.h"
 #include "executor/instrument.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -275,7 +276,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		hint_bit_fpi;
 	bool		prune_fpi = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
-	TransactionId frz_conflict_horizon = InvalidTransactionId;
 
 	/*
 	 * One entry for every tuple that we may freeze.
@@ -691,7 +691,18 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 	if (do_freeze)
 	{
-		frz_conflict_horizon = heap_frz_conflict_horizon(presult, pagefrz);
+		/*
+		 * We can use frz_conflict_horizon as our cutoff for conflicts when
+		 * the whole page is eligible to become all-frozen in the VM once
+		 * we're done with it.  Otherwise we generate a conservative cutoff by
+		 * stepping back from OldestXmin.
+		 */
+		if (!(presult->all_visible_except_removable && presult->all_frozen))
+		{
+			/* Avoids false conflicts when hot_standby_feedback in use */
+			presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+			TransactionIdRetreat(presult->frz_conflict_horizon);
+		}
 		heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
 	}
 	else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
@@ -740,7 +751,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		 */
 		if (do_freeze)
 			xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
-												frz_conflict_horizon);
+												presult->frz_conflict_horizon);
 		else
 			xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
 
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 4b45e8be1ad..8d3723faf3a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1373,33 +1373,6 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
 	return false;
 }
 
-/*
- * Determine the snapshotConflictHorizon for freezing. Must only be called
- * after pruning and determining if the page is freezable.
- */
-TransactionId
-heap_frz_conflict_horizon(PruneFreezeResult *presult, HeapPageFreeze *pagefrz)
-{
-	TransactionId result;
-
-	/*
-	 * We can use frz_conflict_horizon as our cutoff for conflicts when the
-	 * whole page is eligible to become all-frozen in the VM once we're done
-	 * with it.  Otherwise we generate a conservative cutoff by stepping back
-	 * from OldestXmin.
-	 */
-	if (presult->all_visible_except_removable && presult->all_frozen)
-		result = presult->frz_conflict_horizon;
-	else
-	{
-		/* Avoids false conflicts when hot_standby_feedback in use */
-		result = pagefrz->cutoffs->OldestXmin;
-		TransactionIdRetreat(result);
-	}
-
-	return result;
-}
-
 /*
  *	lazy_scan_prune() -- lazy_scan_heap() pruning and freezing.
  *
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index c36623f53bd..4e17347e625 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -290,8 +290,6 @@ extern TM_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 
 extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 
-extern TransactionId heap_frz_conflict_horizon(PruneFreezeResult *presult,
-											   HeapPageFreeze *pagefrz);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  HeapPageFreeze *pagefrz,
 									  HeapTupleFreeze *frz, bool *totally_frozen);
-- 
2.39.2

From 0219842487931f899abcf183c863c43270c098f0 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Sun, 17 Mar 2024 23:05:31 +0200
Subject: [PATCH v3heikki 2/4] Misc cleanup

FIXME and some comments I added with HEIKKI: prefix with questions
---
 src/backend/access/heap/pruneheap.c  | 16 +++++++---------
 src/backend/access/heap/vacuumlazy.c |  7 ++++++-
 2 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index f4f5468e144..b3573bb628b 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -32,10 +32,9 @@
 /* Working data for heap_page_prune_and_freeze() and subroutines */
 typedef struct
 {
-	Relation	rel;
-
 	/* tuple visibility test, initialized for the relation */
 	GlobalVisState *vistest;
+
 	/* whether or not dead items can be set LP_UNUSED during pruning */
 	bool		mark_unused_now;
 
@@ -248,12 +247,12 @@ prune_freeze_xmin_is_removable(GlobalVisState *visstate, TransactionId xmin)
  * the current relfrozenxid and relminmxids used if the caller is interested in
  * freezing tuples on the page.
  *
- * off_loc is the offset location required by the caller to use in error
- * callback.
- *
  * presult contains output parameters needed by callers such as the number of
  * tuples removed and the number of line pointers newly marked LP_DEAD.
  * heap_page_prune_and_freeze() is responsible for initializing it.
+ *
+ * off_loc is the offset location required by the caller to use in error
+ * callback.
  */
 void
 heap_page_prune_and_freeze(Relation relation, Buffer buffer,
@@ -294,7 +293,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * initialize the rest of our working state.
 	 */
 	prstate.new_prune_xid = InvalidTransactionId;
-	prstate.rel = relation;
 	prstate.vistest = vistest;
 	prstate.mark_unused_now = mark_unused_now;
 	prstate.snapshotConflictHorizon = InvalidTransactionId;
@@ -302,9 +300,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	memset(prstate.marked, 0, sizeof(prstate.marked));
 
 	/*
-	 * presult->htsv is not initialized here because all ntuple spots in the
+	 * prstate.htsv is not initialized here because all ntuple spots in the
 	 * array will be set either to a valid HTSV_Result value or -1.
 	 */
+
 	presult->ndeleted = 0;
 	presult->nnewlpdead = 0;
 	presult->nfrozen = 0;
@@ -328,7 +327,7 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	presult->new_relminmxid = InvalidMultiXactId;
 
 	maxoff = PageGetMaxOffsetNumber(page);
-	tup.t_tableOid = RelationGetRelid(prstate.rel);
+	tup.t_tableOid = RelationGetRelid(relation);
 
 	/*
 	 * Determine HTSV for all tuples.
@@ -378,7 +377,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 
 		prstate.htsv[offnum] = heap_prune_satisfies_vacuum(&prstate, &tup,
 														   buffer);
-		Assert(ItemIdIsNormal(itemid));
 
 		/*
 		 * The criteria for counting a tuple as live in this block need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8d3723faf3a..d3df7029667 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -433,7 +433,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * heap_page_prune_and_freeze(). We expect vistest will always make
 	 * heap_page_prune_and_freeze() remove any deleted tuple whose xmax is <
 	 * OldestXmin.  lazy_scan_prune must never become confused about whether a
-	 * tuple should be frozen or removed.  (In the future we might want to
+	 * tuple should be frozen or removed.
+	 * HEIKKI: Is such confusion possible anymore?
+	 * (In the future we might want to
 	 * teach lazy_scan_prune to recompute vistest from time to time, to
 	 * increase the number of dead tuples it can prune away.)
 	 */
@@ -1387,6 +1389,9 @@ lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf, BlockNumber blkno,
  * so we could deal with tuples that were DEAD to VACUUM, but nevertheless were
  * left with storage after pruning.
  *
+ * HEIKKI: does the above paragraph still make sense? We don't call
+ * HeapTupleSatisfiesVacuum() here anymore
+ *
  * As of Postgres 17, we circumvent this problem altogether by reusing the
  * result of heap_page_prune_and_freeze()'s visibility check. Without the
  * second call to HeapTupleSatisfiesVacuum(), there is no new HTSV_Result and
-- 
2.39.2

From d72cebf13f9866112309883f72a382fc2cb57e17 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Sun, 17 Mar 2024 23:08:42 +0200
Subject: [PATCH v3heikki 3/4] Move work to the first loop

It seems more efficient and more straightforward to do freezing in the
first loop. When it was part of the 2nd loop, the 2nd loop needed to
do more work (PageGetItemId() and related checks) for tuples that were
already processed as part of an earlier chain, while in the 1st loop
that work is already done.
---
 src/backend/access/heap/pruneheap.c | 141 ++++++++++------------------
 1 file changed, 52 insertions(+), 89 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index b3573bb628b..3541628799a 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -78,9 +78,6 @@ static int	heap_prune_chain(Buffer buffer,
 							 PruneState *prstate, PruneFreezeResult *presult);
 
 static inline HTSV_Result htsv_get_valid_status(int status);
-static void prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
-									   HeapPageFreeze *pagefrz, HeapTupleFreeze *frozen,
-									   PruneFreezeResult *presult);
 static void heap_prune_record_prunable(PruneState *prstate, TransactionId xid);
 static void heap_prune_record_redirect(PruneState *prstate,
 									   OffsetNumber offnum, OffsetNumber rdoffnum,
@@ -322,6 +319,16 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* for recovery conflicts */
 	presult->frz_conflict_horizon = InvalidTransactionId;
 
+	/*
+	 * We will update the VM after pruning, collecting LP_DEAD items, and
+	 * freezing tuples. Keep track of whether or not the page is all_visible
+	 * and all_frozen and use this information to update the VM. all_visible
+	 * implies lpdead_items == 0, but don't trust all_frozen result unless
+	 * all_visible is also set to true.
+	 */
+	/* HEIKKI: the caller updates the VM? not us */
+	presult->all_frozen = true;
+
 	/* For advancing relfrozenxid and relminmxid */
 	presult->new_relfrozenxid = InvalidTransactionId;
 	presult->new_relminmxid = InvalidMultiXactId;
@@ -493,6 +500,42 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				elog(ERROR, "unexpected HeapTupleSatisfiesVacuum result");
 				break;
 		}
+
+		if (prstate.htsv[offnum] != HEAPTUPLE_DEAD)
+		{
+			/*
+			 * Deliberately don't set hastup for LP_DEAD items.  We make the
+			 * soft assumption that any LP_DEAD items encountered here will
+			 * become LP_UNUSED later on, before count_nondeletable_pages is
+			 * reached.  If we don't make this assumption then rel truncation
+			 * will only happen every other VACUUM, at most.  Besides, VACUUM
+			 * must treat hastup/nonempty_pages as provisional no matter how
+			 * LP_DEAD items are handled (handled here, or handled later on).
+			 */
+			presult->hastup = true;
+
+			/* Since we're not removing this tuple, consider freezing it */
+			if (pagefrz)
+			{
+				bool		totally_frozen;
+
+				if ((heap_prepare_freeze_tuple(htup, pagefrz,
+											   &frozen[presult->nfrozen],
+											   &totally_frozen)))
+				{
+					/* Save prepared freeze plan for later */
+					frozen[presult->nfrozen++].offset = offnum;
+				}
+
+				/*
+				 * If any tuple isn't either totally frozen already or eligible to become
+				 * totally frozen (according to its freeze plan), then the page definitely
+				 * cannot be set all-frozen in the visibility map later on.
+				 */
+				if (!totally_frozen)
+					presult->all_frozen = false;
+			}
+		}
 	}
 
 	/*
@@ -517,61 +560,29 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 */
 	presult->all_visible_except_removable = presult->all_visible;
 
-	/*
-	 * We will update the VM after pruning, collecting LP_DEAD items, and
-	 * freezing tuples. Keep track of whether or not the page is all_visible
-	 * and all_frozen and use this information to update the VM. all_visible
-	 * implies lpdead_items == 0, but don't trust all_frozen result unless
-	 * all_visible is also set to true.
-	 */
-	presult->all_frozen = true;
-
-	/* Scan the page */
+	/* Scan the page for hot chains */
 	for (offnum = FirstOffsetNumber;
 		 offnum <= maxoff;
 		 offnum = OffsetNumberNext(offnum))
 	{
 		ItemId		itemid;
 
-		/* see preceding loop */
-		if (off_loc)
-			*off_loc = offnum;
-
-		if (pagefrz)
-			prune_prepare_freeze_tuple(page, offnum, &prstate,
-									   pagefrz, frozen, presult);
-
-		itemid = PageGetItemId(page, offnum);
-
-		if (ItemIdIsNormal(itemid) &&
-			prstate.htsv[offnum] != HEAPTUPLE_DEAD)
-		{
-			Assert(prstate.htsv[offnum] != -1);
-
-			/*
-			 * Deliberately don't set hastup for LP_DEAD items.  We make the
-			 * soft assumption that any LP_DEAD items encountered here will
-			 * become LP_UNUSED later on, before count_nondeletable_pages is
-			 * reached.  If we don't make this assumption then rel truncation
-			 * will only happen every other VACUUM, at most.  Besides, VACUUM
-			 * must treat hastup/nonempty_pages as provisional no matter how
-			 * LP_DEAD items are handled (handled here, or handled later on).
-			 */
-			presult->hastup = true;
-		}
-
 		/* Ignore items already processed as part of an earlier chain */
 		if (prstate.marked[offnum])
 			continue;
 
+		/* see preceding loop */
+		if (off_loc)
+			*off_loc = offnum;
+
 		/* Nothing to do if slot is empty */
+		itemid = PageGetItemId(page, offnum);
 		if (!ItemIdIsUsed(itemid))
 			continue;
 
 		/* Process this item or chain of items */
 		presult->ndeleted += heap_prune_chain(buffer, offnum,
 											  &prstate, presult);
-
 	}
 
 	/* Clear the offset information once we have processed the given page. */
@@ -1217,54 +1228,6 @@ heap_prune_chain(Buffer buffer, OffsetNumber rootoffnum,
 	return ndeleted;
 }
 
-/*
- * While pruning, before actually executing pruning and updating the line
- * pointers, we may consider freezing tuples referred to by LP_NORMAL line
- * pointers whose visibility status is not HEAPTUPLE_DEAD. That is to say, we
- * want to consider freezing normal tuples which will not be removed.
-*/
-static void
-prune_prepare_freeze_tuple(Page page, OffsetNumber offnum, PruneState *prstate,
-						   HeapPageFreeze *pagefrz,
-						   HeapTupleFreeze *frozen,
-						   PruneFreezeResult *presult)
-{
-	bool		totally_frozen;
-	HeapTupleHeader htup;
-	ItemId		itemid;
-
-	Assert(pagefrz);
-
-	itemid = PageGetItemId(page, offnum);
-
-	if (!ItemIdIsNormal(itemid))
-		return;
-
-	/* We do not consider freezing tuples which will be removed. */
-	if (prstate->htsv[offnum] == HEAPTUPLE_DEAD ||
-		prstate->htsv[offnum] == -1)
-		return;
-
-	htup = (HeapTupleHeader) PageGetItem(page, itemid);
-
-	/* Tuple with storage -- consider need to freeze */
-	if ((heap_prepare_freeze_tuple(htup, pagefrz,
-								   &frozen[presult->nfrozen],
-								   &totally_frozen)))
-	{
-		/* Save prepared freeze plan for later */
-		frozen[presult->nfrozen++].offset = offnum;
-	}
-
-	/*
-	 * If any tuple isn't either totally frozen already or eligible to become
-	 * totally frozen (according to its freeze plan), then the page definitely
-	 * cannot be set all-frozen in the visibility map later on
-	 */
-	if (!totally_frozen)
-		presult->all_frozen = false;
-}
-
 /* Record lowest soon-prunable XID */
 static void
 heap_prune_record_prunable(PruneState *prstate, TransactionId xid)
-- 
2.39.2

From 978fc7c9c6a3c30f34c8d54a98aad8b5163fd0ab Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakan...@iki.fi>
Date: Mon, 18 Mar 2024 00:47:44 +0200
Subject: [PATCH v3heikki 4/4] Refactor heap_page_prune_and_freeze() some more

- Move WAL-logging to separate function

- Refactor the code that executes the actions prune, freeze and
  hint-bit setting actions on the page. There's a little bit of
  repetition, in how pd_prune_xid is set and the PageClearFull() call,
  but I find this easier to follow.

- Instead of checking after-the-fact if MarkBufferDirtyHint()
  generated an FPI, check before entering the critical section if it
  would. There's a small chance that a checkpoint started in between
  and this gives different answer, but it's a very rare and this is a
  crude heuristic anyway.
---
 src/backend/access/heap/pruneheap.c | 433 ++++++++++++++--------------
 1 file changed, 214 insertions(+), 219 deletions(-)

diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index 3541628799a..4d7f7f7ea94 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -48,6 +48,9 @@ typedef struct
 	OffsetNumber nowdead[MaxHeapTuplesPerPage];
 	OffsetNumber nowunused[MaxHeapTuplesPerPage];
 
+	/* one entry for every tuple that we may freeze */
+	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
+
 	/*
 	 * marked[i] is true if item i is entered in one of the above arrays.
 	 *
@@ -89,6 +92,8 @@ static void heap_prune_record_dead_or_unused(PruneState *prstate, OffsetNumber o
 static void heap_prune_record_unused(PruneState *prstate, OffsetNumber offnum);
 static void page_verify_redirects(Page page);
 
+static void log_heap_prune_and_freeze(Relation relation, Buffer buffer, PruneState *prstate, PruneFreezeResult *presult);
+
 
 /*
  * Optionally prune and repair fragmentation in the specified page.
@@ -270,14 +275,8 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	bool		do_hint;
 	bool		whole_page_freezable;
 	bool		hint_bit_fpi;
-	bool		prune_fpi = false;
 	int64		fpi_before = pgWalUsage.wal_fpi;
 
-	/*
-	 * One entry for every tuple that we may freeze.
-	 */
-	HeapTupleFreeze frozen[MaxHeapTuplesPerPage];
-
 	/*
 	 * Our strategy is to scan the page and make lists of items to change,
 	 * then apply the changes within a critical section.  This keeps as much
@@ -520,11 +519,11 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 				bool		totally_frozen;
 
 				if ((heap_prepare_freeze_tuple(htup, pagefrz,
-											   &frozen[presult->nfrozen],
+											   &prstate.frozen[presult->nfrozen],
 											   &totally_frozen)))
 				{
 					/* Save prepared freeze plan for later */
-					frozen[presult->nfrozen++].offset = offnum;
+					prstate.frozen[presult->nfrozen++].offset = offnum;
 				}
 
 				/*
@@ -596,13 +595,6 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	/* Record number of newly-set-LP_DEAD items for caller */
 	presult->nnewlpdead = prstate.ndead;
 
-	/*
-	 * Only incur overhead of checking if we will do an FPI if we might use
-	 * the information.
-	 */
-	if (do_prune && pagefrz)
-		prune_fpi = XLogCheckBufferNeedsBackup(buffer);
-
 	/*
 	 * Even if we don't prune anything, if we found a new value for the
 	 * pd_prune_xid field or the page was marked full, we will update the hint
@@ -626,9 +618,24 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * opportunistic freeze heuristic must be improved; however, for now, try
 	 * to approximate it.
 	 */
-	do_freeze = pagefrz &&
-		(pagefrz->freeze_required ||
-		 (whole_page_freezable && presult->nfrozen > 0 && (prune_fpi || hint_bit_fpi)));
+	do_freeze = false;
+	if (pagefrz)
+	{
+		if (pagefrz->freeze_required)
+			do_freeze = true;
+		else if (whole_page_freezable && presult->nfrozen > 0)
+		{
+			/*
+			 * Freezing would make the page all-frozen. Have we already
+			 * emitted an FPI, or will do so anyway?
+			 */
+			if (hint_bit_fpi ||
+				((do_prune || do_hint) && XLogCheckBufferNeedsBackup(buffer)))
+			{
+				do_freeze = true;
+			}
+		}
+	}
 
 	/*
 	 * Validate the tuples we are considering freezing. We do this even if
@@ -636,85 +643,10 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 	 * still may emit an FPI while setting the page hint bit later. But we
 	 * want to avoid doing the pre-freeze checks in a critical section.
 	 */
-	if (pagefrz &&
-		(pagefrz->freeze_required ||
-		 (whole_page_freezable && presult->nfrozen > 0)))
-		heap_pre_freeze_checks(buffer, frozen, presult->nfrozen);
-
-	/*
-	 * If we are going to modify the page contents anyway, we will have to
-	 * update more than hint bits.
-	 */
-	if (do_freeze || do_prune)
-		do_hint = false;
-
-	START_CRIT_SECTION();
-
-	/*
-	 * Update the page's pd_prune_xid field to either zero, or the lowest XID
-	 * of any soon-prunable tuple.
-	 */
-	((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
-
-	/*
-	 * If pruning, freezing, or updating the hint bit, clear the "page is
-	 * full" flag if it is set since there's no point in repeating the
-	 * prune/defrag process until something else happens to the page.
-	 */
-	if (do_prune || do_freeze || do_hint)
-		PageClearFull(page);
-
-	/*
-	 * Apply the planned item changes, then repair page fragmentation, and
-	 * update the page's hint bit about whether it has free line pointers.
-	 */
-	if (do_prune)
-	{
-		heap_page_prune_execute(buffer,
-								prstate.redirected, prstate.nredirected,
-								prstate.nowdead, prstate.ndead,
-								prstate.nowunused, prstate.nunused);
-	}
-
-	/*
-	 * If we aren't pruning or freezing anything, but we updated pd_prune_xid,
-	 * this is a non-WAL-logged hint.
-	 */
-	if (do_hint)
-	{
-		MarkBufferDirtyHint(buffer, true);
-
-		/*
-		 * We may have decided not to opportunistically freeze above because
-		 * pruning would not emit an FPI. Now, however, if checksums are
-		 * enabled, setting the hint bit may have emitted an FPI. Check again
-		 * if we should freeze.
-		 */
-		hint_bit_fpi = fpi_before != pgWalUsage.wal_fpi;
-
-		if (hint_bit_fpi)
-			do_freeze = pagefrz &&
-				(pagefrz->freeze_required ||
-				 (whole_page_freezable && presult->nfrozen > 0));
-	}
-
 	if (do_freeze)
-	{
-		/*
-		 * We can use frz_conflict_horizon as our cutoff for conflicts when
-		 * the whole page is eligible to become all-frozen in the VM once
-		 * we're done with it.  Otherwise we generate a conservative cutoff by
-		 * stepping back from OldestXmin.
-		 */
-		if (!(presult->all_visible_except_removable && presult->all_frozen))
-		{
-			/* Avoids false conflicts when hot_standby_feedback in use */
-			presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
-			TransactionIdRetreat(presult->frz_conflict_horizon);
-		}
-		heap_freeze_prepared_tuples(buffer, frozen, presult->nfrozen);
-	}
-	else if ((!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
+		heap_pre_freeze_checks(buffer, prstate.frozen, presult->nfrozen);
+
+	if (!do_freeze && (!pagefrz || !presult->all_frozen || presult->nfrozen > 0))
 	{
 		/*
 		 * If we will neither freeze tuples on the page nor set the page all
@@ -725,162 +657,97 @@ heap_page_prune_and_freeze(Relation relation, Buffer buffer,
 		presult->nfrozen = 0;	/* avoid miscounts in instrumentation */
 	}
 
-	if (do_prune || do_freeze)
-		MarkBufferDirty(buffer);
+	START_CRIT_SECTION();
 
-	/*
-	 * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
-	 */
-	if ((do_prune || do_freeze) && RelationNeedsWAL(relation))
+	if (do_hint)
 	{
-		xl_heap_prune xlrec;
-		XLogRecPtr	recptr;
-		xlhp_freeze freeze;
-		xlhp_prune_items redirect,
-					dead,
-					unused;
-
-		int			nplans = 0;
-		xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
-		OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
-
-		xlrec.flags = 0;
-
-		if (RelationIsAccessibleInLogicalDecoding(relation))
-			xlrec.flags |= XLHP_IS_CATALOG_REL;
-
 		/*
-		 * The snapshotConflictHorizon for the whole record should be the most
-		 * conservative of all the horizons calculated for any of the possible
-		 * modifications. If this record will prune tuples, any transactions
-		 * on the standby older than the youngest xmax of the most recently
-		 * removed tuple this record will prune will conflict. If this record
-		 * will freeze tuples, any transactions on the standby with xids older
-		 * than the youngest tuple this record will freeze will conflict.
+		 * Update the page's pd_prune_xid field to either zero, or the lowest XID
+		 * of any soon-prunable tuple.
 		 */
-		if (do_freeze)
-			xlrec.snapshotConflictHorizon = Max(prstate.snapshotConflictHorizon,
-												presult->frz_conflict_horizon);
-		else
-			xlrec.snapshotConflictHorizon = prstate.snapshotConflictHorizon;
+		((PageHeader) page)->pd_prune_xid = prstate.new_prune_xid;
 
 		/*
-		 * Prepare deduplicated representation for use in WAL record
-		 * Destructively sorts tuples array in-place.
+		 * Clear the "page is full" flag if it is set since there's no point
+		 * in repeating the prune/defrag process until something else happens
+		 * to the page.
 		 */
-		if (do_freeze)
-			nplans = heap_log_freeze_plan(frozen,
-										  presult->nfrozen, plans,
-										  frz_offsets);
-		if (nplans > 0)
-			xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
-
-		XLogBeginInsert();
-		XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
-
-		XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+		PageClearFull(page);
 
 		/*
-		 * The OffsetNumber arrays are not actually in the buffer, but we
-		 * pretend that they are.  When XLogInsert stores the whole buffer,
-		 * the offset arrays need not be stored too.
+		 * We only needed to update pd_prune_xid and clear the page-is-full
+		 * hint bit, this is a non-WAL-logged hint. If we will also freeze or
+		 * prune the page, we will mark the buffer dirty below.
 		 */
-		if (nplans > 0)
-		{
-			freeze = (xlhp_freeze)
-			{
-				.nplans = nplans
-			};
-
-			XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
-
-			XLogRegisterBufData(0, (char *) plans,
-								sizeof(xl_heap_freeze_plan) * freeze.nplans);
-		}
-
-
-		if (prstate.nredirected > 0)
-		{
-			xlrec.flags |= XLHP_HAS_REDIRECTIONS;
-
-			redirect = (xlhp_prune_items)
-			{
-				.ntargets = prstate.nredirected
-			};
-
-			XLogRegisterBufData(0, (char *) &redirect,
-								offsetof(xlhp_prune_items, data));
-
-			XLogRegisterBufData(0, (char *) prstate.redirected,
-								sizeof(OffsetNumber[2]) * prstate.nredirected);
-		}
+		if (!do_freeze && !do_prune)
+			MarkBufferDirtyHint(buffer, true);
+	}
 
-		if (prstate.ndead > 0)
+	if (do_prune || do_freeze)
+	{
+		/*
+		 * Apply the planned item changes, then repair page fragmentation, and
+		 * update the page's hint bit about whether it has free line pointers.
+		 */
+		if (do_prune)
 		{
-			xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
-
-			dead = (xlhp_prune_items)
-			{
-				.ntargets = prstate.ndead
-			};
-
-			XLogRegisterBufData(0, (char *) &dead,
-								offsetof(xlhp_prune_items, data));
-
-			XLogRegisterBufData(0, (char *) prstate.nowdead,
-								sizeof(OffsetNumber) * dead.ntargets);
+			heap_page_prune_execute(buffer,
+									prstate.redirected, prstate.nredirected,
+									prstate.nowdead, prstate.ndead,
+									prstate.nowunused, prstate.nunused);
 		}
 
-		if (prstate.nunused > 0)
+		if (do_freeze)
 		{
-			xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
-
-			unused = (xlhp_prune_items)
+			/*
+			 * We can use frz_conflict_horizon as our cutoff for conflicts when
+			 * the whole page is eligible to become all-frozen in the VM once
+			 * we're done with it.  Otherwise we generate a conservative cutoff by
+			 * stepping back from OldestXmin.
+			 */
+			if (!(presult->all_visible_except_removable && presult->all_frozen))
 			{
-				.ntargets = prstate.nunused
-			};
-
-			XLogRegisterBufData(0, (char *) &unused,
-								offsetof(xlhp_prune_items, data));
-
-			XLogRegisterBufData(0, (char *) prstate.nowunused,
-								sizeof(OffsetNumber) * unused.ntargets);
+				/* Avoids false conflicts when hot_standby_feedback in use */
+				presult->frz_conflict_horizon = pagefrz->cutoffs->OldestXmin;
+				TransactionIdRetreat(presult->frz_conflict_horizon);
+			}
+			heap_freeze_prepared_tuples(buffer, prstate.frozen, presult->nfrozen);
 		}
 
-		if (nplans > 0)
-			XLogRegisterBufData(0, (char *) frz_offsets,
-								sizeof(OffsetNumber) * presult->nfrozen);
-
-		recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+		MarkBufferDirty(buffer);
 
-		PageSetLSN(BufferGetPage(buffer), recptr);
+		/*
+		 * Emit a WAL XLOG_HEAP2_PRUNE record showing what we did
+		 */
+		if (RelationNeedsWAL(relation))
+			log_heap_prune_and_freeze(relation, buffer, &prstate, presult);
 	}
 
 	END_CRIT_SECTION();
 
-	/* Caller won't update new_relfrozenxid and new_relminmxid */
-	if (!pagefrz)
-		return;
-
 	/*
+	 * Let caller know how it can updaterelfrozenxid and relminmxid
+	 *
 	 * If we will freeze tuples on the page or, even if we don't freeze tuples
 	 * on the page, if we will set the page all-frozen in the visibility map,
 	 * we can advance relfrozenxid and relminmxid to the values in
 	 * pagefrz->FreezePageRelfrozenXid and pagefrz->FreezePageRelminMxid.
 	 */
-	if (presult->all_frozen || presult->nfrozen > 0)
+	if (pagefrz)
 	{
-		presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
-		presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
-	}
-	else
-	{
-		presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
-		presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+		if (presult->all_frozen || presult->nfrozen > 0)
+		{
+			presult->new_relfrozenxid = pagefrz->FreezePageRelfrozenXid;
+			presult->new_relminmxid = pagefrz->FreezePageRelminMxid;
+		}
+		else
+		{
+			presult->new_relfrozenxid = pagefrz->NoFreezePageRelfrozenXid;
+			presult->new_relminmxid = pagefrz->NoFreezePageRelminMxid;
+		}
 	}
 }
 
-
 /*
  * Perform visibility checks for heap pruning.
  */
@@ -1634,3 +1501,131 @@ heap_get_root_tuples(Page page, OffsetNumber *root_offsets)
 		}
 	}
 }
+
+
+static void
+log_heap_prune_and_freeze(Relation relation, Buffer buffer, PruneState *prstate, PruneFreezeResult *presult)
+{
+	xl_heap_prune xlrec;
+	XLogRecPtr	recptr;
+	xlhp_freeze freeze;
+	xlhp_prune_items redirect,
+		dead,
+		unused;
+
+	int			nplans = 0;
+	xl_heap_freeze_plan plans[MaxHeapTuplesPerPage];
+	OffsetNumber frz_offsets[MaxHeapTuplesPerPage];
+	bool		do_freeze = (presult->nfrozen > 0);
+
+	xlrec.flags = 0;
+
+	if (RelationIsAccessibleInLogicalDecoding(relation))
+		xlrec.flags |= XLHP_IS_CATALOG_REL;
+
+	/*
+	 * The snapshotConflictHorizon for the whole record should be the most
+	 * conservative of all the horizons calculated for any of the possible
+	 * modifications. If this record will prune tuples, any transactions
+	 * on the standby older than the youngest xmax of the most recently
+	 * removed tuple this record will prune will conflict. If this record
+	 * will freeze tuples, any transactions on the standby with xids older
+	 * than the youngest tuple this record will freeze will conflict.
+	 */
+	if (do_freeze)
+		xlrec.snapshotConflictHorizon = Max(prstate->snapshotConflictHorizon,
+											presult->frz_conflict_horizon);
+	else
+		xlrec.snapshotConflictHorizon = prstate->snapshotConflictHorizon;
+
+	/*
+	 * Prepare deduplicated representation for use in WAL record
+	 * Destructively sorts tuples array in-place.
+	 */
+	if (do_freeze)
+		nplans = heap_log_freeze_plan(prstate->frozen,
+									  presult->nfrozen, plans,
+									  frz_offsets);
+	if (nplans > 0)
+		xlrec.flags |= XLHP_HAS_FREEZE_PLANS;
+
+	XLogBeginInsert();
+	XLogRegisterData((char *) &xlrec, SizeOfHeapPrune);
+
+	XLogRegisterBuffer(0, buffer, REGBUF_STANDARD);
+
+	/*
+	 * The OffsetNumber arrays are not actually in the buffer, but we
+	 * pretend that they are.  When XLogInsert stores the whole buffer,
+	 * the offset arrays need not be stored too.
+	 */
+	if (nplans > 0)
+	{
+		freeze = (xlhp_freeze)
+			{
+				.nplans = nplans
+			};
+
+		XLogRegisterBufData(0, (char *) &freeze, offsetof(xlhp_freeze, plans));
+
+		XLogRegisterBufData(0, (char *) plans,
+							sizeof(xl_heap_freeze_plan) * freeze.nplans);
+	}
+
+
+	if (prstate->nredirected > 0)
+	{
+		xlrec.flags |= XLHP_HAS_REDIRECTIONS;
+
+		redirect = (xlhp_prune_items)
+			{
+				.ntargets = prstate->nredirected
+			};
+
+		XLogRegisterBufData(0, (char *) &redirect,
+							offsetof(xlhp_prune_items, data));
+
+		XLogRegisterBufData(0, (char *) prstate->redirected,
+							sizeof(OffsetNumber[2]) * prstate->nredirected);
+	}
+
+	if (prstate->ndead > 0)
+	{
+		xlrec.flags |= XLHP_HAS_DEAD_ITEMS;
+
+		dead = (xlhp_prune_items)
+			{
+				.ntargets = prstate->ndead
+			};
+
+		XLogRegisterBufData(0, (char *) &dead,
+							offsetof(xlhp_prune_items, data));
+
+		XLogRegisterBufData(0, (char *) prstate->nowdead,
+							sizeof(OffsetNumber) * dead.ntargets);
+	}
+
+	if (prstate->nunused > 0)
+	{
+		xlrec.flags |= XLHP_HAS_NOW_UNUSED_ITEMS;
+
+		unused = (xlhp_prune_items)
+			{
+				.ntargets = prstate->nunused
+			};
+
+		XLogRegisterBufData(0, (char *) &unused,
+							offsetof(xlhp_prune_items, data));
+
+		XLogRegisterBufData(0, (char *) prstate->nowunused,
+							sizeof(OffsetNumber) * unused.ntargets);
+	}
+
+	if (nplans > 0)
+		XLogRegisterBufData(0, (char *) frz_offsets,
+							sizeof(OffsetNumber) * presult->nfrozen);
+
+	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_PRUNE);
+
+	PageSetLSN(BufferGetPage(buffer), recptr);
+}
-- 
2.39.2

Reply via email to