Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Peter Geoghegan Thu, 24 Feb 2022 20:53:56 -0800

On Sun, Feb 20, 2022 at 12:27 PM Peter Geoghegan <p...@bowt.ie> wrote:
> You've given me a lot of high quality feedback on all of this, which
> I'll work through soon. It's hard to get the balance right here, but
> it's made much easier by this kind of feedback.


Attached is v9. Lots of changes. Highlights:

* Much improved 0001 ("loosen coupling" dynamic relfrozenxid tracking
patch). Some of the improvements are due to recent feedback from
Robert.

* Much improved 0002 ("Make page-level characteristics drive freezing"
patch). Whole new approach to the implementation, though the same
algorithm as before.

* No more FSM patch -- that was totally separate work, that I
shouldn't have attached to this project.

* There are 2 new patches (these are now 0003 and 0004), both of which
are concerned with allowing non-aggressive VACUUM to consistently
advance relfrozenxid. I think that 0003 makes sense on general
principle, but I'm much less sure about 0004. These aren't too
important.

While working on the new approach to freezing taken by v9-0002, I had
some insight about the issues that Robert raised around 0001, too. I
wasn't expecting that to happen.

0002 makes page-level freezing a first class thing.
heap_prepare_freeze_tuple now has some (limited) knowledge of how this
works. heap_prepare_freeze_tuple's cutoff_xid argument is now always
the VACUUM caller's OldestXmin (not its FreezeLimit, as before). We
still have to pass FreezeLimit to heap_prepare_freeze_tuple, which
helps us to respect FreezeLimit as a backstop, and so now it's passed
via the new backstop_cutoff_xid argument instead. Whenever we opt to
"freeze a page", the new page-level algorithm *always* uses the most
recent possible XID and MXID values (OldestXmin and oldestMxact) to
decide what XIDs/XMIDs need to be replaced. That might sound like it'd
be too much, but it only applies to those pages that we actually
decide to freeze (since page-level characteristics drive everything
now). FreezeLimit is only one way of triggering that now (and one of
the least interesting and rarest).

0002 also adds an alternative set of relfrozenxid/relminmxid tracker
variables, to make the "don't freeze the page" path within
lazy_scan_prune simpler (if you don't want to freeze the page, then
use the set of tracker variables that go with that choice, which
heap_prepare_freeze_tuple knows about and helps with). With page-level
freezing, lazy_scan_prune wants to make a decision about the page as a
whole, at the last minute, after all heap_prepare_freeze_tuple calls
have already been made. So I think that heap_prepare_freeze_tuple
needs to know about that aspect of lazy_scan_prune's behavior.

When we *don't* want to freeze the page, we more or less need
everything related to freezing inside lazy_scan_prune to behave like
lazy_scan_noprune, which never freezes the page (that's mostly the
point of lazy_scan_noprune). And that's almost what we actually do --
heap_prepare_freeze_tuple now outsources maintenance of this
alternative set of "don't freeze the page" relfrozenxid/relminmxid
tracker variables to its sibling function, heap_tuple_needs_freeze.
That is the same function that lazy_scan_noprune itself actually
calls.

Now back to Robert's feedback on 0001, which had very complicated
comments in the last version. This approach seems to make the "being
versus becoming" or "going to freeze versus not going to freeze"
distinctions much clearer. This is less true if you assume that 0002
won't be committed but 0001 will be. Even if that happens with
Postgres 15, I have to imagine that adding something like 0002 must be
the real goal, long term. Without 0002, the value from 0001 is far
more limited. You need both together to get the virtuous cycle I've
described.

The approach with always using OldestXmin as cutoff_xid and
oldestMxact as our cutoff_multi makes a lot of sense to me, in part
because I think that it might well cut down on the tendency of VACUUM
to allocate new MultiXacts in order to be able to freeze old ones.
AFAICT the only reason that heap_prepare_freeze_tuple does that is
because it has no flexibility on FreezeLimit and MultiXactCutoff.
These are derived from vacuum_freeze_min_age and
vacuum_multixact_freeze_min_age, respectively, and so they're two
independent though fairly meaningless cutoffs. On the other hand,
OldestXmin and OldestMxact are not independent in the same way. We get
both of them at the same time and the same place, in
vacuum_set_xid_limits. OldestMxact really is very close to OldestXmin
-- only the units differ.

It seems that heap_prepare_freeze_tuple allocates new MXIDs (when
freezing old ones) in large part so it can NOT freeze XIDs that it
would have been useful (and much cheaper) to remove anyway. On HEAD,
FreezeMultiXactId() doesn't get passed down the VACUUM operation's
OldestXmin at all (it actually just gets FreezeLimit passed as its
cutoff_xid argument). It cannot possibly recognize any of this for
itself.

Does that theory about MultiXacts sound plausible? I'm not claiming
that the patch makes it impossible that FreezeMultiXactId() will have
to allocate a new MultiXact to freeze during VACUUM -- the
freeze-the-dead isolation tests already show that that's not true. I
just think that page-level freezing based on page characteristics with
oldestXmin and oldestMxact (not FreezeLimit and MultiXactCutoff)
cutoffs might make it a lot less likely in practice. oldestXmin and
oldestMxact map to the same wall clock time, more or less -- that
seems like it might be an important distinction, independent of
everything else.

Thanks
--
Peter Geoghegan

From d10f42a1c091b4dc52670fca80a63fee4e73e20c Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 13 Dec 2021 15:00:49 -0800
Subject: [PATCH v9 2/4] Make page-level characteristics drive freezing.

Teach VACUUM to freeze all of the tuples on a page whenever it notices
that it would otherwise mark the page all-visible, without also marking
it all-frozen.  VACUUM typically won't freeze _any_ tuples on the page
unless _all_ tuples (that remain after pruning) are all-visible.  This
makes the overhead of vacuuming much more predictable over time.  We
avoid the need for large balloon payments during aggressive VACUUMs
(typically anti-wraparound autovacuums).  Freezing is proactive, so
we're much less likely to get into "freezing debt".

The new approach to freezing also enables relfrozenxid advancement in
non-aggressive VACUUMs, which might be enough to avoid aggressive
VACUUMs altogether (with many individual tables/workloads).  While the
non-aggressive case continues to skip all-visible (but not all-frozen)
pages (thereby making relfrozenxid advancement impossible), that in
itself will no longer hinder relfrozenxid advancement (outside of
pg_upgrade scenarios).  We now consistently avoid leaving behind
all-visible (not all-frozen) pages.  This (as well as work from commit
44fa84881f) makes relfrozenxid advancement in non-aggressive VACUUMs
commonplace.

There is also a clear disadvantage to the new approach to freezing: more
eager freezing will impose overhead on cases that don't receive any
benefit.  This is considered an acceptable trade-off.  The new algorithm
tends to avoid freezing early on pages where it makes the least sense,
since frequently modified pages are unlikely to be all-visible.

The system accumulates freezing debt in proportion to the number of
physical heap pages with unfrozen tuples, more or less.  Anything based
on XID age is likely to be a poor proxy for the eventual cost of
freezing (during the inevitable anti-wraparound autovacuum).  At a high
level, freezing is now treated as one of the costs of storing tuples in
physical heap pages -- not a cost of transactions that allocate XIDs.
Although vacuum_freeze_min_age and vacuum_multixact_freeze_min_age still
influence what we freeze, and when, they effectively become backstops.
It may still be necessary to "freeze a page" due to the presence of a
particularly old XID, from before VACUUM's FreezeLimit cutoff, though
that will be rare in practice -- FreezeLimit is just a backstop now.  It
can only _trigger_ page-level freezing now.  All XIDs < OldestXmin and
all MXIDs < OldestMxact will now be frozen on any page that VACUUM
decides to freeze, regardless of the details behind its decision.

The autovacuum logging instrumentation (and VACUUM VERBOSE) now display
the number of pages that were "newly frozen".  This new metric will give
users a general sense of how much freezing VACUUM performed.  It tends
to be fairly predictable (as a percentage of rel_pages) for a given
table and workload.

Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
---
 src/include/access/heapam_xlog.h     |  7 ++-
 src/backend/access/heap/heapam.c     | 89 ++++++++++++++++++++++++----
 src/backend/access/heap/vacuumlazy.c | 88 ++++++++++++++++++++-------
 src/backend/commands/vacuum.c        |  8 +++
 4 files changed, 158 insertions(+), 34 deletions(-)

diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 2d8a7f627..a58226e54 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -409,10 +409,15 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  TransactionId relminmxid,
 									  TransactionId cutoff_xid,
 									  TransactionId cutoff_multi,
+									  TransactionId backstop_cutoff_xid,
+									  MultiXactId backstop_cutoff_multi,
 									  xl_heap_freeze_tuple *frz,
 									  bool *totally_frozen,
+									  bool *force_freeze,
 									  TransactionId *relfrozenxid_out,
-									  MultiXactId *relminmxid_out);
+									  MultiXactId *relminmxid_out,
+									  TransactionId *relfrozenxid_nofreeze_out,
+									  MultiXactId *relminmxid_nofreeze_out);
 extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
 									  xl_heap_freeze_tuple *xlrec_tp);
 extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 134bc408a..05253e8dd 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6439,14 +6439,38 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * are older than the specified cutoff XID and cutoff MultiXactId.  If so,
  * setup enough state (in the *frz output argument) to later execute and
  * WAL-log what we would need to do, and return true.  Return false if nothing
- * is to be changed.  In addition, set *totally_frozen_p to true if the tuple
+ * can be changed.  In addition, set *totally_frozen_p to true if the tuple
  * will be totally frozen after these operations are performed and false if
  * more freezing will eventually be required.
  *
+ * Although this interface is primarily tuple-based, vacuumlazy.c caller
+ * cooperates with us to decide on whether or not to freeze whole pages,
+ * together as a single group.  We prepare for freezing at the level of each
+ * tuple, but the final decision is made for the page as a whole.  All pages
+ * that are frozen within a given VACUUM operation are frozen according to
+ * cutoff_xid and cutoff_multi.  Caller _must_ freeze the whole page when
+ * we've set *force_freeze to true!
+ *
+ * cutoff_xid must be caller's oldest xmin to ensure that any XID older than
+ * it could neither be running nor seen as running by any open transaction.
+ * This ensures that the replacement will not change anyone's idea of the
+ * tuple state.  Similarly, cutoff_multi must be the smallest MultiXactId used
+ * by any open transaction (at the time that the oldest xmin was acquired).
+ *
+ * backstop_cutoff_xid must be <= cutoff_xid, and backstop_cutoff_multi must
+ * be <= cutoff_multi.  When any XID/XMID from before these backstop cutoffs
+ * is encountered, we set *force_freeze to true, making caller freeze the page
+ * (freezing-eligible XIDs/XMIDs will be frozen, at least).  "Backstop
+ * freezing" ensures that VACUUM won't allow XIDs/XMIDs to ever get too old.
+ * This shouldn't be necessary very often.  VACUUM should prefer to freeze
+ * when it's cheap (not when it's urgent).
+ *
  * Maintains *relfrozenxid_out and *relminmxid_out, which are the current
- * target relfrozenxid and relminmxid for the relation.  Caller should make
- * temp copies of global tracking variables before starting to process a page,
- * so that we can only scribble on copies.
+ * target relfrozenxid and relminmxid for the relation.  There are also "no
+ * freeze" variants (*relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out)
+ * that are used by caller when it decides to not freeze the page.  Caller
+ * should make temp copies of global tracking variables before starting to
+ * process a page, so that we can only scribble on copies.
  *
  * Caller is responsible for setting the offset field, if appropriate.
  *
@@ -6454,13 +6478,6 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
  * (else we should be removing the tuple, not freezing it).
  *
- * NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
- * XID older than it could neither be running nor seen as running by any
- * open transaction.  This ensures that the replacement will not change
- * anyone's idea of the tuple state.
- * Similarly, cutoff_multi must be less than or equal to the smallest
- * MultiXactId used by any transaction currently open.
- *
  * If the tuple is in a shared buffer, caller must hold an exclusive lock on
  * that buffer.
  *
@@ -6472,12 +6489,18 @@ bool
 heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 						  TransactionId relfrozenxid, TransactionId relminmxid,
 						  TransactionId cutoff_xid, TransactionId cutoff_multi,
+						  TransactionId backstop_cutoff_xid,
+						  MultiXactId backstop_cutoff_multi,
 						  xl_heap_freeze_tuple *frz,
 						  bool *totally_frozen_p,
+						  bool *force_freeze,
 						  TransactionId *relfrozenxid_out,
-						  MultiXactId *relminmxid_out)
+						  MultiXactId *relminmxid_out,
+						  TransactionId *relfrozenxid_nofreeze_out,
+						  MultiXactId *relminmxid_nofreeze_out)
 {
 	bool		changed = false;
+	bool		xmin_already_frozen = false;
 	bool		xmax_already_frozen = false;
 	bool		xmin_frozen;
 	bool		freeze_xmax;
@@ -6498,7 +6521,10 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	 */
 	xid = HeapTupleHeaderGetXmin(tuple);
 	if (!TransactionIdIsNormal(xid))
+	{
+		xmin_already_frozen = true;
 		xmin_frozen = true;
+	}
 	else
 	{
 		if (TransactionIdPrecedes(xid, relfrozenxid))
@@ -6564,6 +6590,13 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 				frz->t_infomask |= HEAP_XMAX_COMMITTED;
 			changed = true;
 
+			/*
+			 * Have caller freeze the page, since setting this MultiXactId to
+			 * a simple XID has some value.  Long-lived MultiXacts should be
+			 * avoided.
+			 */
+			*force_freeze = true;
+
 			if (TransactionIdPrecedes(newxmax, *relfrozenxid_out))
 			{
 				/* New xmax is an XID older than new relfrozenxid_out */
@@ -6609,6 +6642,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			 */
 			if (TransactionIdPrecedes(temp, *relfrozenxid_out))
 				*relfrozenxid_out = temp;
+
+			/*
+			 * We allocated a MultiXact for this, so force freezing to avoid
+			 * wasting it
+			 */
+			*force_freeze = true;
 		}
 	}
 	else if (TransactionIdIsNormal(xid))
@@ -6713,11 +6752,28 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			Assert(!(tuple->t_infomask & HEAP_XMIN_INVALID));
 			frz->t_infomask |= HEAP_XMIN_COMMITTED;
 			changed = true;
+
+			/* Seems like a good idea to freeze early when this case is hit */
+			*force_freeze = true;
 		}
 	}
 
 	*totally_frozen_p = (xmin_frozen &&
 						 (freeze_xmax || xmax_already_frozen));
+
+	/*
+	 * Maintain alternative versions of relfrozenxid_out/relminmxid_out that
+	 * leave caller with the option of *not* freezing the page.  If caller has
+	 * already lost that option (e.g. when the page has an old XID that
+	 * requires backstop freezing), then we don't waste time on this.
+	 */
+	if (!*force_freeze && (!xmin_already_frozen || !xmax_already_frozen))
+		*force_freeze = heap_tuple_needs_freeze(tuple,
+												backstop_cutoff_xid,
+												backstop_cutoff_multi,
+												relfrozenxid_nofreeze_out,
+												relminmxid_nofreeze_out);
+
 	return changed;
 }
 
@@ -6769,15 +6825,22 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 {
 	xl_heap_freeze_tuple frz;
 	bool		do_freeze;
+	bool		force_freeze = true;
 	bool		tuple_totally_frozen;
 	TransactionId relfrozenxid_out = cutoff_xid;
 	MultiXactId relminmxid_out = cutoff_multi;
+	TransactionId relfrozenxid_nofreeze_out = cutoff_xid;
+	MultiXactId relminmxid_nofreeze_out = cutoff_multi;
 
 	do_freeze = heap_prepare_freeze_tuple(tuple,
 										  relfrozenxid, relminmxid,
 										  cutoff_xid, cutoff_multi,
+										  cutoff_xid, cutoff_multi,
 										  &frz, &tuple_totally_frozen,
-										  &relfrozenxid_out, &relminmxid_out);
+										  &force_freeze,
+										  &relfrozenxid_out, &relminmxid_out,
+										  &relfrozenxid_nofreeze_out,
+										  &relminmxid_nofreeze_out);
 
 	/*
 	 * Note that because this is not a WAL-logged operation, we don't need to
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6ebb9c520..f14b64dfc 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -167,9 +167,10 @@ typedef struct LVRelState
 	MultiXactId relminmxid;
 	double		old_live_tuples;	/* previous value of pg_class.reltuples */
 
-	/* VACUUM operation's cutoff for pruning */
+	/* Cutoffs for freezing eligibility */
 	TransactionId OldestXmin;
-	/* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */
+	MultiXactId OldestMxact;
+	/* Backstop cutoffs that force freezing of older XIDs/MXIDs */
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
 	/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
@@ -199,6 +200,7 @@ typedef struct LVRelState
 	BlockNumber scanned_pages;	/* # pages examined (not skipped via VM) */
 	BlockNumber frozenskipped_pages;	/* # frozen pages skipped via VM */
 	BlockNumber removed_pages;	/* # pages removed by relation truncation */
+	BlockNumber newly_frozen_pages; /* # pages with tuples frozen by us */
 	BlockNumber lpdead_item_pages;	/* # pages with LP_DEAD items */
 	BlockNumber missed_dead_pages;	/* # pages with missed dead tuples */
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
@@ -470,8 +472,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->relminmxid = rel->rd_rel->relminmxid;
 	vacrel->old_live_tuples = rel->rd_rel->reltuples;
 
-	/* Set cutoffs for entire VACUUM */
+	/* Initialize freezing cutoffs */
 	vacrel->OldestXmin = OldestXmin;
+	vacrel->OldestMxact = OldestMxact;
 	vacrel->FreezeLimit = FreezeLimit;
 	vacrel->MultiXactCutoff = MultiXactCutoff;
 	/* Initialize state used to track oldest extant XID/XMID */
@@ -643,12 +646,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							 vacrel->relnamespace,
 							 vacrel->relname,
 							 vacrel->num_index_scans);
-			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total)\n"),
+			appendStringInfo(&buf, _("pages: %u removed, %u remain, %u scanned (%.2f%% of total), %u newly frozen (%.2f%% of total)\n"),
 							 vacrel->removed_pages,
 							 vacrel->rel_pages,
 							 vacrel->scanned_pages,
 							 orig_rel_pages == 0 ? 0 :
-							 100.0 * vacrel->scanned_pages / orig_rel_pages);
+							 100.0 * vacrel->scanned_pages / orig_rel_pages,
+							 vacrel->newly_frozen_pages,
+							 orig_rel_pages == 0 ? 0 :
+							 100.0 * vacrel->newly_frozen_pages / orig_rel_pages);
 			appendStringInfo(&buf,
 							 _("tuples: %lld removed, %lld remain, %lld are dead but not yet removable\n"),
 							 (long long) vacrel->tuples_deleted,
@@ -818,6 +824,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	vacrel->scanned_pages = 0;
 	vacrel->frozenskipped_pages = 0;
 	vacrel->removed_pages = 0;
+	vacrel->newly_frozen_pages = 0;
 	vacrel->lpdead_item_pages = 0;
 	vacrel->missed_dead_pages = 0;
 	vacrel->nonempty_pages = 0;
@@ -873,7 +880,10 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	 * When vacrel->aggressive is set, we can't skip pages just because they
 	 * are all-visible, but we can still skip pages that are all-frozen, since
 	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
+	 * safely set for relfrozenxid or relminmxid.  Pages that are set to
+	 * all-visible but not also set to all-frozen are generally only expected
+	 * in pg_upgrade scenarios (these days lazy_scan_prune freezes all of the
+	 * tuples on a page when the page as a whole will be marked all-visible).
 	 *
 	 * Before entering the main loop, establish the invariant that
 	 * next_unskippable_block is the next block number >= blkno that we can't
@@ -1017,7 +1027,7 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 			/*
 			 * SKIP_PAGES_THRESHOLD (threshold for skipping) was not
 			 * crossed, or this is the last page.  Scan the page, even
-			 * though it's all-visible (and possibly even all-frozen).
+			 * though it's all-visible (and likely all-frozen, too).
 			 */
 			all_visible_according_to_vm = true;
 		}
@@ -1585,10 +1595,13 @@ lazy_scan_prune(LVRelState *vacrel,
 				recently_dead_tuples;
 	int			nnewlpdead;
 	int			nfrozen;
+	bool		force_freeze = false;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
-	TransactionId NewRelfrozenXid;
-	MultiXactId NewRelminMxid;
+	TransactionId NewRelfrozenXid,
+				NoFreezeNewRelfrozenXid;
+	MultiXactId NewRelminMxid,
+				NoFreezeNewRelminMxid;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -1597,8 +1610,8 @@ lazy_scan_prune(LVRelState *vacrel,
 retry:
 
 	/* Initialize (or reset) page-level state */
-	NewRelfrozenXid = vacrel->NewRelfrozenXid;
-	NewRelminMxid = vacrel->NewRelminMxid;
+	NewRelfrozenXid = NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid;
+	NewRelminMxid = NoFreezeNewRelminMxid = vacrel->NewRelminMxid;
 	tuples_deleted = 0;
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -1669,8 +1682,15 @@ retry:
 		 */
 		if (ItemIdIsDead(itemid))
 		{
+			/*
+			 * We delay setting all_visible to false in the event of seeing an
+			 * LP_DEAD item.  We need to test "is the page all_visible if we
+			 * just consider remaining tuples with tuple storage?" below, when
+			 * considering if we want to freeze the page.  We set all_visible
+			 * to false for our caller last, when doing final processing of
+			 * any LP_DEAD items collected here.
+			 */
 			deadoffsets[lpdead_items++] = offnum;
-			prunestate->all_visible = false;
 			prunestate->has_lpdead_items = true;
 			continue;
 		}
@@ -1803,12 +1823,17 @@ retry:
 		if (heap_prepare_freeze_tuple(tuple.t_data,
 									  vacrel->relfrozenxid,
 									  vacrel->relminmxid,
+									  vacrel->OldestXmin,
+									  vacrel->OldestMxact,
 									  vacrel->FreezeLimit,
 									  vacrel->MultiXactCutoff,
 									  &frozen[nfrozen],
 									  &tuple_totally_frozen,
+									  &force_freeze,
 									  &NewRelfrozenXid,
-									  &NewRelminMxid))
+									  &NewRelminMxid,
+									  &NoFreezeNewRelfrozenXid,
+									  &NoFreezeNewRelminMxid))
 		{
 			/* Will execute freeze below */
 			frozen[nfrozen++].offset = offnum;
@@ -1829,9 +1854,31 @@ retry:
 	 * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
 	 * that remains and needs to be considered for freezing now (LP_UNUSED and
 	 * LP_REDIRECT items also remain, but are of no further interest to us).
+	 *
+	 * Freeze the page (based on heap_prepare_freeze_tuple's instructions)
+	 * when it is about to become all-visible.  Also freeze in cases where
+	 * heap_prepare_freeze_tuple requires it.  This usually happens due to the
+	 * presence of an old XID from before FreezeLimit.
 	 */
-	vacrel->NewRelfrozenXid = NewRelfrozenXid;
-	vacrel->NewRelminMxid = NewRelminMxid;
+	if (prunestate->all_visible || force_freeze)
+	{
+		/*
+		 * We're freezing the page.  Our final NewRelfrozenXid doesn't need to
+		 * be affected by the XIDs/XMIDs that are just about to be frozen
+		 * anyway.
+		 */
+		vacrel->NewRelfrozenXid = NewRelfrozenXid;
+		vacrel->NewRelminMxid = NewRelminMxid;
+	}
+	else
+	{
+		/* This is comparable to lazy_scan_noprune's handling */
+		vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid;
+		vacrel->NewRelminMxid = NoFreezeNewRelminMxid;
+
+		/* Forget heap_prepare_freeze_tuple's guidance on freezing */
+		nfrozen = 0;
+	}
 
 	/*
 	 * Consider the need to freeze any items with tuple storage from the page
@@ -1839,7 +1886,7 @@ retry:
 	 */
 	if (nfrozen > 0)
 	{
-		Assert(prunestate->hastup);
+		vacrel->newly_frozen_pages++;
 
 		/*
 		 * At least one tuple with storage needs to be frozen -- execute that
@@ -1869,7 +1916,7 @@ retry:
 		{
 			XLogRecPtr	recptr;
 
-			recptr = log_heap_freeze(vacrel->rel, buf, vacrel->FreezeLimit,
+			recptr = log_heap_freeze(vacrel->rel, buf, NewRelfrozenXid,
 									 frozen, nfrozen);
 			PageSetLSN(page, recptr);
 		}
@@ -1892,7 +1939,7 @@ retry:
 	 */
 #ifdef USE_ASSERT_CHECKING
 	/* Note that all_frozen value does not matter when !all_visible */
-	if (prunestate->all_visible)
+	if (prunestate->all_visible && lpdead_items == 0)
 	{
 		TransactionId cutoff;
 		bool		all_frozen;
@@ -1900,7 +1947,6 @@ retry:
 		if (!heap_page_is_all_visible(vacrel, buf, &cutoff, &all_frozen))
 			Assert(false);
 
-		Assert(lpdead_items == 0);
 		Assert(prunestate->all_frozen == all_frozen);
 
 		/*
@@ -1922,9 +1968,11 @@ retry:
 		VacDeadItems *dead_items = vacrel->dead_items;
 		ItemPointerData tmp;
 
-		Assert(!prunestate->all_visible);
 		Assert(prunestate->has_lpdead_items);
 
+		/* Caller expects LP_DEAD items to unset all_visible */
+		prunestate->all_visible = false;
+
 		vacrel->lpdead_item_pages++;
 
 		ItemPointerSetBlockNumber(&tmp, blkno);
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0ae3b4506..514658ba0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -957,6 +957,14 @@ get_all_vacuum_rels(int options)
  * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a
  * minimum).
  *
+ * While non-aggressive VACUUMs are never required to advance relfrozenxid and
+ * relminmxid, they often do so in practice.  They freeze wherever possible,
+ * based on the same criteria that aggressive VACUUMs use.  FreezeLimit and
+ * multiXactCutoff are still applied as backstop cutoffs, that force freezing
+ * of older XIDs/XMIDs that did not get frozen based on the standard criteria.
+ * (Actually, the backstop cutoffs won't force freezing in rare cases where a
+ * cleanup lock cannot be acquired on a page during a non-aggressive VACUUM.)
+ *
  * oldestXmin and oldestMxact are the most recent values that can ever be
  * passed to vac_update_relstats() as frozenxid and minmulti arguments by our
  * vacuumlazy.c caller later on.  These values should be passed when it turns
-- 
2.30.2

From 15dec1e572ac4da0540251253c3c219eadf46a83 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Thu, 24 Feb 2022 17:21:45 -0800
Subject: [PATCH v9 4/4] Avoid setting a page all-visible but not all-frozen.

This is pretty much an addendum to the work in the "Make page-level
characteristics drive freezing" commit.  It has been broken out like
this because I'm not even sure if it's necessary.  It seems like we
might want to be paranoid about losing out on the chance to advance
relfrozenxid in non-aggressive VACUUMs, though.

The only test that will trigger this case is the "freeze-the-dead"
isolation test.  It's incredibly narrow.  On the other hand, why take a
chance?  All it takes is one heap page that's all-visible (and not also
all-frozen) nestled between some all-frozen heap pages to lose out on
relfrozenxid advancement.  The SKIP_PAGES_THRESHOLD stuff won't save us
then [1].

[1] For context see commit bf136cf6e3 -- SKIP_PAGES_THRESHOLD is
specifically concerned with relfrozenxid advancement in non-aggressive
VACUUMs, and always has been.  This isn't directly documented right now.
---
 src/backend/access/heap/vacuumlazy.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index b2d3b039d..5eede8c55 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1981,6 +1981,26 @@ retry:
 	}
 #endif
 
+	/*
+	 * Since OldestXmin and OldestMxact are not absolutely precise, there is a
+	 * tiny chance that we will consider the page all-visible while not also
+	 * considering it all-frozen (having frozen the page with the expectation
+	 * that that would render it all-frozen).  This can happen when there is a
+	 * MultiXact containing XIDs from before and after OldestXmin, for
+	 * example.  This risks making relfrozenxid advancement by future
+	 * non-aggressive VACUUMs impossible, which is a heavy price to pay just
+	 * to be able to avoid accessing one single isolated heap page.
+	 *
+	 * We could just live with this, but it seems prudent to avoid the problem
+	 * instead.  And so we deliberately throw away the opportunity to set such
+	 * a page all-visible instead of allowing this case.
+	 *
+	 * XXX What about the lazy_vacuum_heap_page/heap_page_is_all_visible path,
+	 * which could still set the page just all-visible when that happens?
+	 */
+	if (prunestate->all_visible && !prunestate->all_frozen)
+		prunestate->all_visible = false;
+
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel
 	 */
-- 
2.30.2

From d2190abf366f148bae5307442e8a6245c6922e78 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 21 Feb 2022 12:46:44 -0800
Subject: [PATCH v9 3/4] Remove aggressive VACUUM skipping special case.

Since it's simply never okay to miss out on advancing relfrozenxid
during an aggressive VACUUM (that's the whole point), the aggressive
case treated any page from a next_unskippable_block-wise skippable block
range as an all-frozen page (not a merely all-visible page) during
skipping.  Such a page might not be all-visible/all-frozen at the point
that it actually gets skipped, but it could nevertheless be safely
skipped, and then counted in frozenskipped_pages (the page must have
been all-frozen back when we determined the extent of the range of
blocks to skip, since aggressive VACUUMs _must_ scan all-visible pages).
This is necessary to ensure that aggressive VACUUMs are always capable
of advancing relfrozenxid.

The non-aggressive case behaved slightly differently: it rechecked the
visibility map for each page at the point of skipping, and only counted
pages in frozenskipped_pages when they were still all-frozen at that
time.  But it skipped the page either way (since we already committed to
skipping the page at the point of the recheck).  This was correct, but
sometimes resulted in non-aggressive VACUUMs needlessly wasting an
opportunity to advance relfrozenxid (when a page was modified in just
the wrong way, at just the wrong time).  It also resulted in a needless
recheck of the visibility map for each and every page skipped during
non-aggressive VACUUMs.

Avoid these problems by conditioning the "skippable page was definitely
all-frozen when range of skippable pages was first determined" behavior
on what the visibility map _actually said_ about the range as a whole
back when we first determined the extent of the range (don't deduce what
must have happened at that time on the basis of aggressive-ness).  This
allows us to reliably count skipped pages in frozenskipped_pages when
they were initially all-frozen.  In particular, when a page's visibility
map bit is unset after the point where a skippable range of pages is
initially determined, but before the point where the page is actually
skipped, non-aggressive VACUUMs now count it in frozenskipped_pages,
just like aggressive VACUUMs always have [1].  It's not critical for the
non-aggressive case to get this right, but there is no reason not to.

[1] Actually, it might not work that way when there happens to be a mix
of all-visible and all-frozen pages in a range of skippable pages.
There is no chance of VACUUM advancing relfrozenxid in this scenario
either way, though, so it doesn't matter.
---
 src/backend/access/heap/vacuumlazy.c | 59 +++++++++++++++++++---------
 1 file changed, 40 insertions(+), 19 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index f14b64dfc..b2d3b039d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -542,7 +542,14 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 */
 	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages)
 	{
-		/* Cannot advance relfrozenxid/relminmxid */
+		/*
+		 * Skipped some all-visible pages, so definitely cannot advance
+		 * relfrozenxid.  This is generally only expected in pg_upgrade
+		 * scenarios, since VACUUM now avoids setting a page to all-visible
+		 * but not all-frozen.  However, it's also possible (though quite
+		 * unlikely) that we ended up here because somebody else cleared some
+		 * page's all-frozen flag (without clearing its all-visible flag).
+		 */
 		Assert(!aggressive);
 		frozenxid_updated = minmulti_updated = false;
 		vac_update_relstats(rel, new_rel_pages, new_live_tuples,
@@ -810,7 +817,8 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 				next_failsafe_block,
 				next_fsm_block_to_vacuum;
 	Buffer		vmbuffer = InvalidBuffer;
-	bool		skipping_blocks;
+	bool		skipping_blocks,
+				skipping_allfrozen_blocks;
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -905,27 +913,31 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 	 * computed, so they'll have no effect on the value to which we can safely
 	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
 	 */
+	skipping_allfrozen_blocks = true;	/* iff skipping_blocks */
 	next_unskippable_block = 0;
 	if (vacrel->skipwithvm)
 	{
 		while (next_unskippable_block < nblocks)
 		{
-			uint8		vmstatus;
+			uint8		vmskipflags;
 
-			vmstatus = visibilitymap_get_status(vacrel->rel,
-												next_unskippable_block,
-												&vmbuffer);
+			vmskipflags = visibilitymap_get_status(vacrel->rel,
+												   next_unskippable_block,
+												   &vmbuffer);
 			if (vacrel->aggressive)
 			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+				if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
 					break;
 			}
 			else
 			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+				if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
 					break;
 			}
 			vacuum_delay_point();
+
+			if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
+				skipping_allfrozen_blocks = false;
 			next_unskippable_block++;
 		}
 	}
@@ -949,6 +961,8 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 
 		if (blkno == next_unskippable_block)
 		{
+			skipping_allfrozen_blocks = true;	/* iff skipping_blocks */
+
 			/* Time to advance next_unskippable_block */
 			next_unskippable_block++;
 			if (vacrel->skipwithvm)
@@ -971,6 +985,9 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 							break;
 					}
 					vacuum_delay_point();
+
+					if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
+						skipping_allfrozen_blocks = false;
 					next_unskippable_block++;
 				}
 			}
@@ -997,8 +1014,11 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 		{
 			/*
 			 * The current page can be skipped if we've seen a long enough run
-			 * of skippable blocks to justify skipping it -- provided it's not
-			 * the last page in the relation (according to rel_pages/nblocks).
+			 * of skippable blocks to justify skipping it.  An aggressive
+			 * VACUUM can only skip a range of blocks that were determined to
+			 * be all-frozen (not just all-visible) as a group back when the
+			 * next_unskippable_block-wise extent of the range was determined.
+			 * Assert that we got this right in passing.
 			 *
 			 * We always scan the table's last page to determine whether it
 			 * has tuples or not, even if it would otherwise be skipped. This
@@ -1006,19 +1026,20 @@ lazy_scan_heap(LVRelState *vacrel, int nworkers)
 			 * on the table to attempt a truncation that just fails
 			 * immediately because there are tuples on the last page.
 			 */
+			Assert(!vacrel->aggressive || !skipping_blocks ||
+				   skipping_allfrozen_blocks);
 			if (skipping_blocks && blkno < nblocks - 1)
 			{
 				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was initially all-frozen, so we have to
-				 * recheck.
+				 * When skipping a range of blocks with one or more blocks
+				 * that are not all-frozen (expected during a non-aggressive
+				 * VACUUM following pg_upgrade), we need to recheck if this
+				 * block is all-frozen to maintain frozenskipped_pages.  The
+				 * block might not even be all-visible by now, but it's always
+				 * okay to skip (see note above about visibilitymap_get_status
+				 * return value being out-of-date).
 				 */
-				if (vacrel->aggressive ||
+				if (skipping_allfrozen_blocks ||
 					VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
 					vacrel->frozenskipped_pages++;
 				continue;
-- 
2.30.2

From 483bc8df203f9df058fcb53e7972e3912e223b30 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan <pg@bowt.ie>
Date: Mon, 22 Nov 2021 10:02:30 -0800
Subject: [PATCH v9 1/4] Loosen coupling between relfrozenxid and freezing.

When VACUUM set relfrozenxid before now, it set it to whatever value was
used to determine which tuples to freeze -- the FreezeLimit cutoff.
This approach was very naive: the relfrozenxid invariant only requires
that new relfrozenxid values be <= the oldest extant XID remaining in
the table (at the point that the VACUUM operation ends), which in
general might be much more recent than FreezeLimit.  There is no fixed
relationship between the amount of physical work performed by VACUUM to
make it safe to advance relfrozenxid (freezing and pruning), and the
actual number of XIDs that relfrozenxid can be advanced by (at least in
principle) as a result.  VACUUM might have to freeze all of the tuples
from a hundred million heap pages just to enable relfrozenxid to be
advanced by no more than one or two XIDs.  On the other hand, VACUUM
might end up doing little or no work, and yet still be capable of
advancing relfrozenxid by hundreds of millions of XIDs as a result.

VACUUM now sets relfrozenxid (and relminmxid) using the exact oldest
extant XID (and oldest extant MultiXactId) from the table, including
XIDs from the table's remaining/unfrozen MultiXacts.  This requires that
VACUUM carefully track the oldest unfrozen XID/MultiXactId as it goes.
This optimization doesn't require any changes to the definition of
relfrozenxid, nor does it require changes to the core design of
freezing.

Final relfrozenxid values must still be >= FreezeLimit in an aggressive
VACUUM (FreezeLimit is still used as an XID-age based backstop there).
In non-aggressive VACUUMs (where there is still no strict guarantee that
relfrozenxid will be advanced at all), we now advance relfrozenxid by as
much as we possibly can.  This exploits workload conditions that make it
easy to advance relfrozenxid by many more XIDs (for the same amount of
freezing/pruning work).

The non-aggressive case can now set relfrozenxid to any legal XID value,
which could in principle be any XID that is > the existing relfrozenxid,
and <= the VACUUM operation's OldestXmin/"removal cutoff" XID value.
FreezeLimit is still used by VACUUM to determine which tuples to freeze,
at least for now.  Practical experience from the field may show that
non-aggressive VACUUMs seldom need to set relfrozenxid to an XID from
before FreezeLimit, but having the option still seems very valuable.

A later commit will teach VACUUM to determine which tuples to freeze
based on page-level characteristics.  Without this improved approach to
freezing in place, most individual tables still have very little chance
of relfrozenxid advancement during non-aggressive VACUUMs (an aggressive
anti-wraparound autovacuum will still eventually be required with most
tables).  All it takes is an earlier VACUUM that sets just a few pages
all-visible (but not all-frozen); later non-aggressive VACUUMs will end
up skipping those pages, as a matter of policy, making relfrozenxid
advancement impossible.  This can be avoided by avoiding setting pages
all-visible (but not all-frozen) in the first place.

Once VACUUM becomes capable of consistently advancing relfrozenxid, even
during non-aggressive VACUUMs, relfrozenxid values (and especially
relminmxid values) will tend to track what's really happening in each
table much more accurately.  This is expected to make anti-wraparound
autovacuums far rarer in practice.  The problem of "anti-wraparound
stampedes" (where multiple anti-wraparound autovacuums are launched at
exactly the same time) is also naturally avoided by advancing
relfrozenxid early and often (since it results in "natural diversity"
among relfrozenxid values, due to table-level workload characteristics).

Credit for the general idea of using the oldest extant XID to set
pg_class.relfrozenxid at the end of VACUUM goes to Andres Freund.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkymFbz6D_vL+jmqSn_5q1wsFvFrE+37yLgL_Rkfd6Gzg@mail.gmail.com
---
 src/include/access/heapam.h          |   7 +-
 src/include/access/heapam_xlog.h     |   4 +-
 src/include/commands/vacuum.h        |   1 +
 src/backend/access/heap/heapam.c     | 194 ++++++++++++++++++++-------
 src/backend/access/heap/vacuumlazy.c | 128 +++++++++++++-----
 src/backend/commands/cluster.c       |   5 +-
 src/backend/commands/vacuum.c        |  42 +++---
 7 files changed, 280 insertions(+), 101 deletions(-)

diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b46ab7d73..10584a4ce 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -167,8 +167,11 @@ extern void heap_inplace_update(Relation relation, HeapTuple tuple);
 extern bool heap_freeze_tuple(HeapTupleHeader tuple,
 							  TransactionId relfrozenxid, TransactionId relminmxid,
 							  TransactionId cutoff_xid, TransactionId cutoff_multi);
-extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
-									MultiXactId cutoff_multi);
+extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple,
+									TransactionId backstop_cutoff_xid,
+									MultiXactId backstop_cutoff_multi,
+									TransactionId *relfrozenxid_nofreeze_out,
+									MultiXactId *relminmxid_nofreeze_out);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
 
 extern void simple_heap_insert(Relation relation, HeapTuple tup);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 5c47fdcec..2d8a7f627 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -410,7 +410,9 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 									  TransactionId cutoff_xid,
 									  TransactionId cutoff_multi,
 									  xl_heap_freeze_tuple *frz,
-									  bool *totally_frozen);
+									  bool *totally_frozen,
+									  TransactionId *relfrozenxid_out,
+									  MultiXactId *relminmxid_out);
 extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
 									  xl_heap_freeze_tuple *xlrec_tp);
 extern XLogRecPtr log_heap_visible(RelFileNode rnode, Buffer heap_buffer,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index d64f6268f..ead88edda 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -291,6 +291,7 @@ extern bool vacuum_set_xid_limits(Relation rel,
 								  int multixact_freeze_min_age,
 								  int multixact_freeze_table_age,
 								  TransactionId *oldestXmin,
+								  MultiXactId *oldestMxact,
 								  TransactionId *freezeLimit,
 								  MultiXactId *multiXactCutoff);
 extern bool vacuum_xid_failsafe_check(TransactionId relfrozenxid,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 59d43e2ba..134bc408a 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6140,12 +6140,24 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
  * FRM_RETURN_IS_MULTI
  *		The return value is a new MultiXactId to set as new Xmax.
  *		(caller must obtain proper infomask bits using GetMultiXactIdHintBits)
+ *
+ * "relfrozenxid_out" is an output value; it's used to maintain target new
+ * relfrozenxid for the relation.  It can be ignored unless "flags" contains
+ * either FRM_NOOP or FRM_RETURN_IS_MULTI, because we only handle multiXacts
+ * here.  This follows the general convention: only track XIDs that will still
+ * be in the table after the ongoing VACUUM finishes.  Note that it's up to
+ * caller to maintain this when the Xid return value is itself an Xid.
+ *
+ * Note that we cannot depend on xmin to maintain relfrozenxid_out.  We need
+ * to push maintenance of relfrozenxid_out down this far, since in general
+ * xmin might have been frozen by an earlier VACUUM operation, in which case
+ * our caller will not have factored-in xmin into relfrozenxid_out's value.
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 				  TransactionId relfrozenxid, TransactionId relminmxid,
 				  TransactionId cutoff_xid, MultiXactId cutoff_multi,
-				  uint16 *flags)
+				  uint16 *flags, TransactionId *relfrozenxid_out)
 {
 	TransactionId xid = InvalidTransactionId;
 	int			i;
@@ -6157,6 +6169,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	bool		has_lockers;
 	TransactionId update_xid;
 	bool		update_committed;
+	TransactionId temprelfrozenxid_out;
 
 	*flags = 0;
 
@@ -6251,13 +6264,13 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 
 	/* is there anything older than the cutoff? */
 	need_replace = false;
+	temprelfrozenxid_out = *relfrozenxid_out;
 	for (i = 0; i < nmembers; i++)
 	{
 		if (TransactionIdPrecedes(members[i].xid, cutoff_xid))
-		{
 			need_replace = true;
-			break;
-		}
+		if (TransactionIdPrecedes(members[i].xid, temprelfrozenxid_out))
+			temprelfrozenxid_out = members[i].xid;
 	}
 
 	/*
@@ -6266,6 +6279,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 */
 	if (!need_replace)
 	{
+		*relfrozenxid_out = temprelfrozenxid_out;
 		*flags |= FRM_NOOP;
 		pfree(members);
 		return InvalidTransactionId;
@@ -6275,6 +6289,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	 * If the multi needs to be updated, figure out which members do we need
 	 * to keep.
 	 */
+	temprelfrozenxid_out = *relfrozenxid_out;
 	nnewmembers = 0;
 	newmembers = palloc(sizeof(MultiXactMember) * nmembers);
 	has_lockers = false;
@@ -6356,7 +6371,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 * list.)
 			 */
 			if (TransactionIdIsValid(update_xid))
+			{
 				newmembers[nnewmembers++] = members[i];
+				if (TransactionIdPrecedes(members[i].xid, temprelfrozenxid_out))
+					temprelfrozenxid_out = members[i].xid;
+			}
 		}
 		else
 		{
@@ -6366,6 +6385,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			{
 				/* running locker cannot possibly be older than the cutoff */
 				Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));
+				Assert(!TransactionIdPrecedes(members[i].xid, *relfrozenxid_out));
 				newmembers[nnewmembers++] = members[i];
 				has_lockers = true;
 			}
@@ -6394,6 +6414,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		if (update_committed)
 			*flags |= FRM_MARK_COMMITTED;
 		xid = update_xid;
+		/* Caller manages relfrozenxid_out directly when we return an XID */
 	}
 	else
 	{
@@ -6403,6 +6424,7 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		 */
 		xid = MultiXactIdCreateFromMembers(nnewmembers, newmembers);
 		*flags |= FRM_RETURN_IS_MULTI;
+		*relfrozenxid_out = temprelfrozenxid_out;
 	}
 
 	pfree(newmembers);
@@ -6421,6 +6443,11 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * will be totally frozen after these operations are performed and false if
  * more freezing will eventually be required.
  *
+ * Maintains *relfrozenxid_out and *relminmxid_out, which are the current
+ * target relfrozenxid and relminmxid for the relation.  Caller should make
+ * temp copies of global tracking variables before starting to process a page,
+ * so that we can only scribble on copies.
+ *
  * Caller is responsible for setting the offset field, if appropriate.
  *
  * It is assumed that the caller has checked the tuple with
@@ -6445,7 +6472,10 @@ bool
 heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 						  TransactionId relfrozenxid, TransactionId relminmxid,
 						  TransactionId cutoff_xid, TransactionId cutoff_multi,
-						  xl_heap_freeze_tuple *frz, bool *totally_frozen_p)
+						  xl_heap_freeze_tuple *frz,
+						  bool *totally_frozen_p,
+						  TransactionId *relfrozenxid_out,
+						  MultiXactId *relminmxid_out)
 {
 	bool		changed = false;
 	bool		xmax_already_frozen = false;
@@ -6489,6 +6519,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			frz->t_infomask |= HEAP_XMIN_FROZEN;
 			changed = true;
 		}
+		else if (TransactionIdPrecedes(xid, *relfrozenxid_out))
+		{
+			/* won't be frozen, but older than current relfrozenxid_out */
+			*relfrozenxid_out = xid;
+		}
 	}
 
 	/*
@@ -6506,10 +6541,11 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 	{
 		TransactionId newxmax;
 		uint16		flags;
+		TransactionId temp = *relfrozenxid_out;
 
 		newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
 									relfrozenxid, relminmxid,
-									cutoff_xid, cutoff_multi, &flags);
+									cutoff_xid, cutoff_multi, &flags, &temp);
 
 		freeze_xmax = (flags & FRM_INVALIDATE_XMAX);
 
@@ -6527,6 +6563,24 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			if (flags & FRM_MARK_COMMITTED)
 				frz->t_infomask |= HEAP_XMAX_COMMITTED;
 			changed = true;
+
+			if (TransactionIdPrecedes(newxmax, *relfrozenxid_out))
+			{
+				/* New xmax is an XID older than new relfrozenxid_out */
+				*relfrozenxid_out = newxmax;
+			}
+		}
+		else if (flags & FRM_NOOP)
+		{
+			/*
+			 * Changing nothing, so might have to ratchet back relminmxid_out,
+			 * relfrozenxid_out, or both together
+			 */
+			if (MultiXactIdIsValid(xid) &&
+				MultiXactIdPrecedes(xid, *relminmxid_out))
+				*relminmxid_out = xid;
+			if (TransactionIdPrecedes(temp, *relfrozenxid_out))
+				*relfrozenxid_out = temp;
 		}
 		else if (flags & FRM_RETURN_IS_MULTI)
 		{
@@ -6548,6 +6602,13 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			frz->xmax = newxmax;
 
 			changed = true;
+
+			/*
+			 * New multixact might have remaining XID older than
+			 * relfrozenxid_out
+			 */
+			if (TransactionIdPrecedes(temp, *relfrozenxid_out))
+				*relfrozenxid_out = temp;
 		}
 	}
 	else if (TransactionIdIsNormal(xid))
@@ -6575,7 +6636,14 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 			freeze_xmax = true;
 		}
 		else
+		{
 			freeze_xmax = false;
+			if (TransactionIdPrecedes(xid, *relfrozenxid_out))
+			{
+				/* won't be frozen, but older than current relfrozenxid_out */
+				*relfrozenxid_out = xid;
+			}
+		}
 	}
 	else if ((tuple->t_infomask & HEAP_XMAX_INVALID) ||
 			 !TransactionIdIsValid(HeapTupleHeaderGetRawXmax(tuple)))
@@ -6622,6 +6690,9 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple,
 		 * was removed in PostgreSQL 9.0.  Note that if we were to respect
 		 * cutoff_xid here, we'd need to make surely to clear totally_frozen
 		 * when we skipped freezing on that basis.
+		 *
+		 * Since we always freeze here, relfrozenxid_out doesn't need to be
+		 * maintained.
 		 */
 		if (TransactionIdIsNormal(xid))
 		{
@@ -6699,11 +6770,14 @@ heap_freeze_tuple(HeapTupleHeader tuple,
 	xl_heap_freeze_tuple frz;
 	bool		do_freeze;
 	bool		tuple_totally_frozen;
+	TransactionId relfrozenxid_out = cutoff_xid;
+	MultiXactId relminmxid_out = cutoff_multi;
 
 	do_freeze = heap_prepare_freeze_tuple(tuple,
 										  relfrozenxid, relminmxid,
 										  cutoff_xid, cutoff_multi,
-										  &frz, &tuple_totally_frozen);
+										  &frz, &tuple_totally_frozen,
+										  &relfrozenxid_out, &relminmxid_out);
 
 	/*
 	 * Note that because this is not a WAL-logged operation, we don't need to
@@ -7133,6 +7207,22 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
  * Check to see whether any of the XID fields of a tuple (xmin, xmax, xvac)
  * are older than the specified cutoff XID or MultiXactId.  If so, return true.
  *
+ * See heap_prepare_freeze_tuple for information about the basic rules for the
+ * cutoffs used here.
+ *
+ * Maintains *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out, which
+ * are the current target relfrozenxid and relminmxid for the relation.  We
+ * assume that caller will never want to freeze its tuple, even when the tuple
+ * "needs freezing" according to our return value.  Caller should make temp
+ * copies of global tracking variables before starting to process a page, so
+ * that we can only scribble on copies.  That way caller can just discard the
+ * temp copies if it isn't okay with that assumption.
+ *
+ * Only aggressive VACUUM callers are expected to really care when a tuple
+ * "needs freezing" according to us.  It follows that non-aggressive VACUUMs
+ * can use *relfrozenxid_nofreeze_out and *relminmxid_nofreeze_out in all
+ * cases.
+ *
  * It doesn't matter whether the tuple is alive or dead, we are checking
  * to see if a tuple needs to be removed or frozen to avoid wraparound.
  *
@@ -7140,15 +7230,23 @@ heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple)
  * on a standby.
  */
 bool
-heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
-						MultiXactId cutoff_multi)
+heap_tuple_needs_freeze(HeapTupleHeader tuple,
+						TransactionId backstop_cutoff_xid,
+						MultiXactId backstop_cutoff_multi,
+						TransactionId *relfrozenxid_nofreeze_out,
+						MultiXactId *relminmxid_nofreeze_out)
 {
 	TransactionId xid;
+	bool		needs_freeze = false;
 
 	xid = HeapTupleHeaderGetXmin(tuple);
-	if (TransactionIdIsNormal(xid) &&
-		TransactionIdPrecedes(xid, cutoff_xid))
-		return true;
+	if (TransactionIdIsNormal(xid))
+	{
+		if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+			*relfrozenxid_nofreeze_out = xid;
+		if (TransactionIdPrecedes(xid, backstop_cutoff_xid))
+			needs_freeze = true;
+	}
 
 	/*
 	 * The considerations for multixacts are complicated; look at
@@ -7158,57 +7256,59 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 	{
 		MultiXactId multi;
+		MultiXactMember *members;
+		int			nmembers;
 
 		multi = HeapTupleHeaderGetRawXmax(tuple);
-		if (!MultiXactIdIsValid(multi))
-		{
-			/* no xmax set, ignore */
-			;
-		}
-		else if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
+		if (MultiXactIdIsValid(multi) &&
+			MultiXactIdPrecedes(multi, *relminmxid_nofreeze_out))
+			*relminmxid_nofreeze_out = multi;
+
+		if (HEAP_LOCKED_UPGRADED(tuple->t_infomask))
 			return true;
-		else if (MultiXactIdPrecedes(multi, cutoff_multi))
-			return true;
-		else
+		else if (MultiXactIdPrecedes(multi, backstop_cutoff_multi))
+			needs_freeze = true;
+
+		/* need to check whether any member of the mxact is too old */
+		nmembers = GetMultiXactIdMembers(multi, &members, false,
+										 HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
+
+		for (int i = 0; i < nmembers; i++)
 		{
-			MultiXactMember *members;
-			int			nmembers;
-			int			i;
-
-			/* need to check whether any member of the mxact is too old */
-
-			nmembers = GetMultiXactIdMembers(multi, &members, false,
-											 HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
-
-			for (i = 0; i < nmembers; i++)
-			{
-				if (TransactionIdPrecedes(members[i].xid, cutoff_xid))
-				{
-					pfree(members);
-					return true;
-				}
-			}
-			if (nmembers > 0)
-				pfree(members);
+			if (TransactionIdPrecedes(members[i].xid, backstop_cutoff_xid))
+				needs_freeze = true;
+			if (TransactionIdPrecedes(members[i].xid,
+									  *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
 		}
+		if (nmembers > 0)
+			pfree(members);
 	}
 	else
 	{
 		xid = HeapTupleHeaderGetRawXmax(tuple);
-		if (TransactionIdIsNormal(xid) &&
-			TransactionIdPrecedes(xid, cutoff_xid))
-			return true;
+		if (TransactionIdIsNormal(xid))
+		{
+			if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
+			if (TransactionIdPrecedes(xid, backstop_cutoff_xid))
+				needs_freeze = true;
+		}
 	}
 
 	if (tuple->t_infomask & HEAP_MOVED)
 	{
 		xid = HeapTupleHeaderGetXvac(tuple);
-		if (TransactionIdIsNormal(xid) &&
-			TransactionIdPrecedes(xid, cutoff_xid))
-			return true;
+		if (TransactionIdIsNormal(xid))
+		{
+			if (TransactionIdPrecedes(xid, *relfrozenxid_nofreeze_out))
+				*relfrozenxid_nofreeze_out = xid;
+			if (TransactionIdPrecedes(xid, backstop_cutoff_xid))
+				needs_freeze = true;
+		}
 	}
 
-	return false;
+	return needs_freeze;
 }
 
 /*
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 40101e0cb..6ebb9c520 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -144,7 +144,7 @@ typedef struct LVRelState
 	Relation   *indrels;
 	int			nindexes;
 
-	/* Aggressive VACUUM (scan all unfrozen pages)? */
+	/* Aggressive VACUUM? (must set relfrozenxid >= FreezeLimit) */
 	bool		aggressive;
 	/* Use visibility map to skip? (disabled by DISABLE_PAGE_SKIPPING) */
 	bool		skipwithvm;
@@ -172,8 +172,9 @@ typedef struct LVRelState
 	/* VACUUM operation's cutoff for freezing XIDs and MultiXactIds */
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
-	/* Are FreezeLimit/MultiXactCutoff still valid? */
-	bool		freeze_cutoffs_valid;
+	/* Tracks oldest extant XID/MXID for setting relfrozenxid/relminmxid */
+	TransactionId NewRelfrozenXid;
+	MultiXactId NewRelminMxid;
 
 	/* Error reporting state */
 	char	   *relnamespace;
@@ -329,6 +330,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	PgStat_Counter startreadtime = 0;
 	PgStat_Counter startwritetime = 0;
 	TransactionId OldestXmin;
+	MultiXactId OldestMxact;
 	TransactionId FreezeLimit;
 	MultiXactId MultiXactCutoff;
 
@@ -355,17 +357,17 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * used to determine which XIDs/MultiXactIds will be frozen.
 	 *
 	 * If this is an aggressive VACUUM, then we're strictly required to freeze
-	 * any and all XIDs from before FreezeLimit, so that we will be able to
-	 * safely advance relfrozenxid up to FreezeLimit below (we must be able to
-	 * advance relminmxid up to MultiXactCutoff, too).
+	 * any and all XIDs from before FreezeLimit in order to be able to advance
+	 * relfrozenxid to a value >= FreezeLimit below.  There is an analogous
+	 * requirement around MultiXact freezing, relminmxid, and MultiXactCutoff.
 	 */
 	aggressive = vacuum_set_xid_limits(rel,
 									   params->freeze_min_age,
 									   params->freeze_table_age,
 									   params->multixact_freeze_min_age,
 									   params->multixact_freeze_table_age,
-									   &OldestXmin, &FreezeLimit,
-									   &MultiXactCutoff);
+									   &OldestXmin, &OldestMxact,
+									   &FreezeLimit, &MultiXactCutoff);
 
 	skipwithvm = true;
 	if (params->options & VACOPT_DISABLE_PAGE_SKIPPING)
@@ -472,8 +474,9 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	vacrel->OldestXmin = OldestXmin;
 	vacrel->FreezeLimit = FreezeLimit;
 	vacrel->MultiXactCutoff = MultiXactCutoff;
-	/* Track if cutoffs became invalid (possible in !aggressive case only) */
-	vacrel->freeze_cutoffs_valid = true;
+	/* Initialize state used to track oldest extant XID/XMID */
+	vacrel->NewRelfrozenXid = OldestXmin;
+	vacrel->NewRelminMxid = OldestMxact;
 
 	/*
 	 * Call lazy_scan_heap to perform all required heap pruning, index
@@ -526,16 +529,15 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	 * Aggressive VACUUM must reliably advance relfrozenxid (and relminmxid).
 	 * We are able to advance relfrozenxid in a non-aggressive VACUUM too,
 	 * provided we didn't skip any all-visible (not all-frozen) pages using
-	 * the visibility map, and assuming that we didn't fail to get a cleanup
-	 * lock that made it unsafe with respect to FreezeLimit (or perhaps our
-	 * MultiXactCutoff) established for VACUUM operation.
+	 * the visibility map.  A non-aggressive VACUUM might advance relfrozenxid
+	 * to an XID that is either older or newer than FreezeLimit (same applies
+	 * to relminmxid and MultiXactCutoff).
 	 *
 	 * NB: We must use orig_rel_pages, not vacrel->rel_pages, since we want
 	 * the rel_pages used by lazy_scan_heap, which won't match when we
 	 * happened to truncate the relation afterwards.
 	 */
-	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages ||
-		!vacrel->freeze_cutoffs_valid)
+	if (vacrel->scanned_pages + vacrel->frozenskipped_pages < orig_rel_pages)
 	{
 		/* Cannot advance relfrozenxid/relminmxid */
 		Assert(!aggressive);
@@ -549,9 +551,16 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 	{
 		Assert(vacrel->scanned_pages + vacrel->frozenskipped_pages ==
 			   orig_rel_pages);
+		Assert(!aggressive ||
+			   TransactionIdPrecedesOrEquals(FreezeLimit,
+											 vacrel->NewRelfrozenXid));
+		Assert(!aggressive ||
+			   MultiXactIdPrecedesOrEquals(MultiXactCutoff,
+										   vacrel->NewRelminMxid));
+
 		vac_update_relstats(rel, new_rel_pages, new_live_tuples,
 							new_rel_allvisible, vacrel->nindexes > 0,
-							FreezeLimit, MultiXactCutoff,
+							vacrel->NewRelfrozenXid, vacrel->NewRelminMxid,
 							&frozenxid_updated, &minmulti_updated, false);
 	}
 
@@ -656,17 +665,19 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
 							 OldestXmin, diff);
 			if (frozenxid_updated)
 			{
-				diff = (int32) (FreezeLimit - vacrel->relfrozenxid);
+				diff = (int32) (vacrel->NewRelfrozenXid - vacrel->relfrozenxid);
+				Assert(diff > 0);
 				appendStringInfo(&buf,
 								 _("new relfrozenxid: %u, which is %d xids ahead of previous value\n"),
-								 FreezeLimit, diff);
+								 vacrel->NewRelfrozenXid, diff);
 			}
 			if (minmulti_updated)
 			{
-				diff = (int32) (MultiXactCutoff - vacrel->relminmxid);
+				diff = (int32) (vacrel->NewRelminMxid - vacrel->relminmxid);
+				Assert(diff > 0);
 				appendStringInfo(&buf,
 								 _("new relminmxid: %u, which is %d mxids ahead of previous value\n"),
-								 MultiXactCutoff, diff);
+								 vacrel->NewRelminMxid, diff);
 			}
 			if (orig_rel_pages > 0)
 			{
@@ -1576,6 +1587,8 @@ lazy_scan_prune(LVRelState *vacrel,
 	int			nfrozen;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
 	xl_heap_freeze_tuple frozen[MaxHeapTuplesPerPage];
+	TransactionId NewRelfrozenXid;
+	MultiXactId NewRelminMxid;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -1583,7 +1596,9 @@ lazy_scan_prune(LVRelState *vacrel,
 
 retry:
 
-	/* Initialize (or reset) page-level counters */
+	/* Initialize (or reset) page-level state */
+	NewRelfrozenXid = vacrel->NewRelfrozenXid;
+	NewRelminMxid = vacrel->NewRelminMxid;
 	tuples_deleted = 0;
 	lpdead_items = 0;
 	live_tuples = 0;
@@ -1791,7 +1806,9 @@ retry:
 									  vacrel->FreezeLimit,
 									  vacrel->MultiXactCutoff,
 									  &frozen[nfrozen],
-									  &tuple_totally_frozen))
+									  &tuple_totally_frozen,
+									  &NewRelfrozenXid,
+									  &NewRelminMxid))
 		{
 			/* Will execute freeze below */
 			frozen[nfrozen++].offset = offnum;
@@ -1805,13 +1822,16 @@ retry:
 			prunestate->all_frozen = false;
 	}
 
+	vacrel->offnum = InvalidOffsetNumber;
+
 	/*
 	 * We have now divided every item on the page into either an LP_DEAD item
 	 * that will need to be vacuumed in indexes later, or a LP_NORMAL tuple
 	 * that remains and needs to be considered for freezing now (LP_UNUSED and
 	 * LP_REDIRECT items also remain, but are of no further interest to us).
 	 */
-	vacrel->offnum = InvalidOffsetNumber;
+	vacrel->NewRelfrozenXid = NewRelfrozenXid;
+	vacrel->NewRelminMxid = NewRelminMxid;
 
 	/*
 	 * Consider the need to freeze any items with tuple storage from the page
@@ -1962,6 +1982,8 @@ lazy_scan_noprune(LVRelState *vacrel,
 				missed_dead_tuples;
 	HeapTupleHeader tupleheader;
 	OffsetNumber deadoffsets[MaxHeapTuplesPerPage];
+	TransactionId NoFreezeNewRelfrozenXid = vacrel->NewRelfrozenXid;
+	MultiXactId NoFreezeNewRelminMxid = vacrel->NewRelminMxid;
 
 	Assert(BufferGetBlockNumber(buf) == blkno);
 
@@ -2007,20 +2029,56 @@ lazy_scan_noprune(LVRelState *vacrel,
 		tupleheader = (HeapTupleHeader) PageGetItem(page, itemid);
 		if (heap_tuple_needs_freeze(tupleheader,
 									vacrel->FreezeLimit,
-									vacrel->MultiXactCutoff))
+									vacrel->MultiXactCutoff,
+									&NoFreezeNewRelfrozenXid,
+									&NoFreezeNewRelminMxid))
 		{
 			if (vacrel->aggressive)
 			{
-				/* Going to have to get cleanup lock for lazy_scan_prune */
+				/*
+				 * heap_tuple_needs_freeze determined that it isn't going to
+				 * be possible for the ongoing aggressive VACUUM operation to
+				 * advance relfrozenxid to a value >= FreezeLimit without
+				 * freezing one or more tuples with older XIDs from this page.
+				 * (Or perhaps the issue was that MultiXactCutoff could not be
+				 * respected.  Might have even been both cutoffs, together.)
+				 *
+				 * Tell caller that it must acquire a full cleanup lock.  It's
+				 * possible that caller will have to wait a while for one, but
+				 * that can't be helped -- full processing by lazy_scan_prune
+				 * is required to freeze the older XIDs (and/or freeze older
+				 * MultiXactIds).
+				 *
+				 * lazy_scan_prune expects a clean slate.  Forget everything
+				 * that lazy_scan_noprune learned about the page, including
+				 * NewRelfrozenXid and NewRelminMxid tracking information.
+				 */
 				vacrel->offnum = InvalidOffsetNumber;
 				return false;
 			}
-
-			/*
-			 * Current non-aggressive VACUUM operation definitely won't be
-			 * able to advance relfrozenxid or relminmxid
-			 */
-			vacrel->freeze_cutoffs_valid = false;
+			else
+			{
+				/*
+				 * This is a non-aggressive VACUUM, which is under no strict
+				 * obligation to advance relfrozenxid at all (much less to
+				 * advance it to a value >= FreezeLimit).  Non-aggressive
+				 * VACUUM advances relfrozenxid/relminmxid on a best-effort
+				 * basis.  It never waits for a cleanup lock.
+				 *
+				 * NewRelfrozenXid (and/or NewRelminMxid) will still have been
+				 * ratcheted back as needed.  heap_tuple_needs_freeze assumes
+				 * that its caller _might_ prefer to carry on without freezing
+				 * anything on the page in the event of a tuple containing an
+				 * XID/MXID that "needs freezing".
+				 *
+				 * The fact that we won't be able to advance relfrozenxid up
+				 * to FreezeLimit on this occasion is no reason to completely
+				 * give up on advancing relfrozenxid.  There is likely to be
+				 * some benefit from advancing relfrozenxid by any amount,
+				 * even if the final value is significantly older than our
+				 * FreezeLimit.
+				 */
+			}
 		}
 
 		ItemPointerSet(&(tuple.t_self), blkno, offnum);
@@ -2069,6 +2127,14 @@ lazy_scan_noprune(LVRelState *vacrel,
 
 	vacrel->offnum = InvalidOffsetNumber;
 
+	/*
+	 * We have committed to not freezing the tuples on this page (always
+	 * happens with a non-aggressive VACUUM), so make sure that the target
+	 * relfrozenxid/relminmxid values reflect the XIDs/MXIDs we encountered
+	 */
+	vacrel->NewRelfrozenXid = NoFreezeNewRelfrozenXid;
+	vacrel->NewRelminMxid = NoFreezeNewRelminMxid;
+
 	/*
 	 * Now save details of the LP_DEAD items from the page in vacrel (though
 	 * only when VACUUM uses two-pass strategy)
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 02a7e94bf..a7e988298 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -767,6 +767,7 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	TupleDesc	oldTupDesc PG_USED_FOR_ASSERTS_ONLY;
 	TupleDesc	newTupDesc PG_USED_FOR_ASSERTS_ONLY;
 	TransactionId OldestXmin;
+	MultiXactId oldestMxact;
 	TransactionId FreezeXid;
 	MultiXactId MultiXactCutoff;
 	bool		use_sort;
@@ -856,8 +857,8 @@ copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 * Since we're going to rewrite the whole table anyway, there's no reason
 	 * not to be aggressive about this.
 	 */
-	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0,
-						  &OldestXmin, &FreezeXid, &MultiXactCutoff);
+	vacuum_set_xid_limits(OldHeap, 0, 0, 0, 0, &OldestXmin, &oldestMxact,
+						  &FreezeXid, &MultiXactCutoff);
 
 	/*
 	 * FreezeXid will become the table's new relfrozenxid, and that mustn't go
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 50a4a612e..0ae3b4506 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -945,14 +945,22 @@ get_all_vacuum_rels(int options)
  * The output parameters are:
  * - oldestXmin is the Xid below which tuples deleted by any xact (that
  *   committed) should be considered DEAD, not just RECENTLY_DEAD.
- * - freezeLimit is the Xid below which all Xids are replaced by
- *	 FrozenTransactionId during vacuum.
- * - multiXactCutoff is the value below which all MultiXactIds are removed
- *   from Xmax.
+ * - oldestMxact is the Mxid below which MultiXacts are definitely not
+ *   seen as visible by any running transaction.
+ * - freezeLimit is the Xid below which all Xids are definitely replaced by
+ *   FrozenTransactionId during aggressive vacuums.
+ * - multiXactCutoff is the value below which all MultiXactIds are definitely
+ *   removed from Xmax during aggressive vacuums.
  *
  * Return value indicates if vacuumlazy.c caller should make its VACUUM
  * operation aggressive.  An aggressive VACUUM must advance relfrozenxid up to
- * FreezeLimit, and relminmxid up to multiXactCutoff.
+ * FreezeLimit (at a minimum), and relminmxid up to multiXactCutoff (at a
+ * minimum).
+ *
+ * oldestXmin and oldestMxact are the most recent values that can ever be
+ * passed to vac_update_relstats() as frozenxid and minmulti arguments by our
+ * vacuumlazy.c caller later on.  These values should be passed when it turns
+ * out that VACUUM will leave no unfrozen XIDs/XMIDs behind in the table.
  */
 bool
 vacuum_set_xid_limits(Relation rel,
@@ -961,6 +969,7 @@ vacuum_set_xid_limits(Relation rel,
 					  int multixact_freeze_min_age,
 					  int multixact_freeze_table_age,
 					  TransactionId *oldestXmin,
+					  MultiXactId *oldestMxact,
 					  TransactionId *freezeLimit,
 					  MultiXactId *multiXactCutoff)
 {
@@ -969,7 +978,6 @@ vacuum_set_xid_limits(Relation rel,
 	int			effective_multixact_freeze_max_age;
 	TransactionId limit;
 	TransactionId safeLimit;
-	MultiXactId oldestMxact;
 	MultiXactId mxactLimit;
 	MultiXactId safeMxactLimit;
 	int			freezetable;
@@ -1065,9 +1073,11 @@ vacuum_set_xid_limits(Relation rel,
 						 effective_multixact_freeze_max_age / 2);
 	Assert(mxid_freezemin >= 0);
 
+	/* Remember for caller */
+	*oldestMxact = GetOldestMultiXactId();
+
 	/* compute the cutoff multi, being careful to generate a valid value */
-	oldestMxact = GetOldestMultiXactId();
-	mxactLimit = oldestMxact - mxid_freezemin;
+	mxactLimit = *oldestMxact - mxid_freezemin;
 	if (mxactLimit < FirstMultiXactId)
 		mxactLimit = FirstMultiXactId;
 
@@ -1082,8 +1092,8 @@ vacuum_set_xid_limits(Relation rel,
 				(errmsg("oldest multixact is far in the past"),
 				 errhint("Close open transactions with multixacts soon to avoid wraparound problems.")));
 		/* Use the safe limit, unless an older mxact is still running */
-		if (MultiXactIdPrecedes(oldestMxact, safeMxactLimit))
-			mxactLimit = oldestMxact;
+		if (MultiXactIdPrecedes(*oldestMxact, safeMxactLimit))
+			mxactLimit = *oldestMxact;
 		else
 			mxactLimit = safeMxactLimit;
 	}
@@ -1390,14 +1400,10 @@ vac_update_relstats(Relation relation,
 	 * Update relfrozenxid, unless caller passed InvalidTransactionId
 	 * indicating it has no new data.
 	 *
-	 * Ordinarily, we don't let relfrozenxid go backwards: if things are
-	 * working correctly, the only way the new frozenxid could be older would
-	 * be if a previous VACUUM was done with a tighter freeze_min_age, in
-	 * which case we don't want to forget the work it already did.  However,
-	 * if the stored relfrozenxid is "in the future", then it must be corrupt
-	 * and it seems best to overwrite it with the cutoff we used this time.
-	 * This should match vac_update_datfrozenxid() concerning what we consider
-	 * to be "in the future".
+	 * Ordinarily, we don't let relfrozenxid go backwards.  However, if the
+	 * stored relfrozenxid is "in the future", then it must be corrupt, so
+	 * just overwrite it.  This should match vac_update_datfrozenxid()
+	 * concerning what we consider to be "in the future".
 	 */
 	if (frozenxid_updated)
 		*frozenxid_updated = false;
-- 
2.30.2

Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Reply via email to