Hi,

On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> Here's that patch.  I've stared at this some, and Robert did too. Robert
> mentioned that the commit message might need some polish and I'm not
> 100% sure about the error message texts yet.
> 
> I'm not yet convinced that the new elog in vacuumlazy can never trigger
> - but I also don't think we want to actually freeze the tuple in that
> case.

I'm fairly sure it could be triggered, therefore I've rewritten that.

I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse.  It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required.  The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.


Please review!


One thing I'm wondering about is whether we have any way for users to
diagnose whether they need to dump & restore - I can't really think of
anything actually meaningful. Reindexing will find some instances, but
far from all. VACUUM (disable_page_skipping, freeze) pg_class will
also find a bunch.  Not a perfect story.


> Staring at the vacuumlazy hunk I think I might have found a related bug:
> heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> a multixact and still running.  It does so without verifying liveliness
> of members.  Isn't that buggy? Consider what happens if we have three
> blocks: 1 has free space, two is being vacuumed and is locked, three is
> full and has a tuple that's key share locked by a live tuple and is
> updated by a dead xmax from before the xmin horizon. In that case afaict
> the multi will be copied from the third page to the first one.  Which is
> quite bad, because vacuum already processed it, and we'll set
> relfrozenxid accordingly.  I hope I'm missing something here?

Trying to write a testcase for that now.

Greetings,

Andres Freund
>From 6d9f0b60da30aca9bf5eaab814b1d5d7f1ccb8c4 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Fri, 3 Nov 2017 07:52:29 -0700
Subject: [PATCH 1/2] Fix pruning of locked and updated tuples.

Previously it was possible that a tuple was not pruned during vacuum,
even though it's update xmax (i.e. the updating xid in a multixact
with both key share lockers and an updater) was below the cutoff
horizon.

As the freezing code assumed, rightly so, that that's not supposed to
happen, xmax would be preserved (as a member of a new multixact or
xmax directly). That causes two problems: For one the tuple is below
the xmin horizon, which can cause problems if the clog is truncated or
once there's an xid wraparound. The bigger problem is that that will
break HOT chains, which in turn can lead two to breakages: First,
failing index lookups, which in turn can e.g lead to constraints being
violated. Second, future hot prunes / vacuums can end up making
invisible tuples visible again. There's other harmful scenarios.

Fix the problem by recognizing that tuples can be DEAD instead of
RECENTLY_DEAD, even if the multixactid has alive members, if the
update_xid is below the xmin horizon. That's safe because newer
versions of the tuple will contain the locking xids.

This commit also rewrites, and activates, a previously added
regression test to actually showcase corruption without the fix
applied.

A followup commit will harden the code somewhat against future similar
bugs and already corrupted data.

Author: Andres Freund
Per-Discussion: Robert Haas and Freund
Reported-By: Daniel Wood
Discussion:
    https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com
    https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
Backpatch: 9.3-
---
 src/backend/utils/time/tqual.c                  |  61 ++++++--------
 src/test/isolation/expected/freeze-the-dead.out | 107 +++++-------------------
 src/test/isolation/specs/freeze-the-dead.spec   |  36 +++++++-
 3 files changed, 82 insertions(+), 122 deletions(-)

diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index a821e2eed10..8be2980116c 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1311,49 +1311,42 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
 
 	if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
 	{
-		TransactionId xmax;
+		TransactionId xmax = HeapTupleGetUpdateXid(tuple);
 
-		if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
-		{
-			/* already checked above */
-			Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
-
-			xmax = HeapTupleGetUpdateXid(tuple);
-
-			/* not LOCKED_ONLY, so it has to have an xmax */
-			Assert(TransactionIdIsValid(xmax));
-
-			if (TransactionIdIsInProgress(xmax))
-				return HEAPTUPLE_DELETE_IN_PROGRESS;
-			else if (TransactionIdDidCommit(xmax))
-				/* there are still lockers around -- can't return DEAD here */
-				return HEAPTUPLE_RECENTLY_DEAD;
-			/* updating transaction aborted */
-			return HEAPTUPLE_LIVE;
-		}
-
-		Assert(!(tuple->t_infomask & HEAP_XMAX_COMMITTED));
-
-		xmax = HeapTupleGetUpdateXid(tuple);
+		/* already checked above */
+		Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));
 
 		/* not LOCKED_ONLY, so it has to have an xmax */
 		Assert(TransactionIdIsValid(xmax));
 
-		/* multi is not running -- updating xact cannot be */
-		Assert(!TransactionIdIsInProgress(xmax));
-		if (TransactionIdDidCommit(xmax))
+		if (TransactionIdIsInProgress(xmax))
+			return HEAPTUPLE_DELETE_IN_PROGRESS;
+		else if (TransactionIdDidCommit(xmax))
 		{
-			if (!TransactionIdPrecedes(xmax, OldestXmin))
-				return HEAPTUPLE_RECENTLY_DEAD;
-			else
+			/*
+			 * The multixact might still be running due to lockers. If the
+			 * updater is below the horizon we have to return DEAD regardless
+			 * - otherwise we could end up with a tuple where the updater has
+			 * to be removed due to the horizon, but is not pruned away. It's
+			 * not a problem to prune that tuple because all the lockers will
+			 * also be present in the newer tuple version.
+			 */
+			if (TransactionIdPrecedes(xmax, OldestXmin))
+			{
 				return HEAPTUPLE_DEAD;
+			}
+			else
+				return HEAPTUPLE_RECENTLY_DEAD;
+		}
+		else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+		{
+			/*
+			 * Not in Progress, Not Committed, so either Aborted or crashed.
+			 * Remove the Xmax.
+			 */
+			SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
 		}
 
-		/*
-		 * Not in Progress, Not Committed, so either Aborted or crashed.
-		 * Remove the Xmax.
-		 */
-		SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
 		return HEAPTUPLE_LIVE;
 	}
 
diff --git a/src/test/isolation/expected/freeze-the-dead.out b/src/test/isolation/expected/freeze-the-dead.out
index dd045613f93..8e638f132f9 100644
--- a/src/test/isolation/expected/freeze-the-dead.out
+++ b/src/test/isolation/expected/freeze-the-dead.out
@@ -1,101 +1,36 @@
-Parsed test spec with 2 sessions
+Parsed test spec with 3 sessions
 
-starting permutation: s1_update s1_commit s1_vacuum s2_key_share s2_commit
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s1_commit s2_key_share s1_vacuum s2_commit
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s1_commit s2_key_share s2_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s1_update s2_key_share s1_commit s1_vacuum s2_commit
+starting permutation: s1_begin s2_begin s3_begin s1_update s2_key_share s3_key_share s1_update s1_commit s2_commit s2_vacuum s1_selectone s3_commit s2_vacuum s1_selectall
+step s1_begin: BEGIN;
+step s2_begin: BEGIN;
+step s3_begin: BEGIN;
 step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
 step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
 id             
 
 3              
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s1_update s2_key_share s1_commit s2_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s1_commit: COMMIT;
-step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s1_update s2_key_share s2_commit s1_commit s1_vacuum
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s2_commit: COMMIT;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-
-starting permutation: s2_key_share s1_update s1_commit s1_vacuum s2_commit
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
-step s2_commit: COMMIT;
-
-starting permutation: s2_key_share s1_update s1_commit s2_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
+step s3_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
 id             
 
 3              
 step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
 step s1_commit: COMMIT;
 step s2_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+step s2_vacuum: VACUUM FREEZE tab_freeze;
+step s1_selectone: 
+    BEGIN;
+    SET LOCAL enable_seqscan = false;
+    SET LOCAL enable_bitmapscan = false;
+    SELECT * FROM tab_freeze WHERE id = 3;
+    COMMIT;
 
-starting permutation: s2_key_share s1_update s2_commit s1_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
+id             name           x              
 
-3              
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s2_commit: COMMIT;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+3              333            2              
+step s3_commit: COMMIT;
+step s2_vacuum: VACUUM FREEZE tab_freeze;
+step s1_selectall: SELECT * FROM tab_freeze ORDER BY name, id;
+id             name           x              
 
-starting permutation: s2_key_share s2_commit s1_update s1_commit s1_vacuum
-step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
-id             
-
-3              
-step s2_commit: COMMIT;
-step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
-step s1_commit: COMMIT;
-step s1_vacuum: VACUUM FREEZE tab_freeze;
+1              111            0              
+3              333            2              
diff --git a/src/test/isolation/specs/freeze-the-dead.spec b/src/test/isolation/specs/freeze-the-dead.spec
index 3cd9965b2fa..e24d7d5d116 100644
--- a/src/test/isolation/specs/freeze-the-dead.spec
+++ b/src/test/isolation/specs/freeze-the-dead.spec
@@ -16,12 +16,44 @@ teardown
 }
 
 session "s1"
-setup				{ BEGIN; }
+step "s1_begin"		{ BEGIN; }
 step "s1_update"	{ UPDATE tab_freeze SET x = x + 1 WHERE id = 3; }
 step "s1_commit"	{ COMMIT; }
 step "s1_vacuum"	{ VACUUM FREEZE tab_freeze; }
+step "s1_selectone"	{
+    BEGIN;
+    SET LOCAL enable_seqscan = false;
+    SET LOCAL enable_bitmapscan = false;
+    SELECT * FROM tab_freeze WHERE id = 3;
+    COMMIT;
+}
+step "s1_selectall"	{ SELECT * FROM tab_freeze ORDER BY name, id; }
+step "s1_reindex"	{ REINDEX TABLE tab_freeze; }
 
 session "s2"
-setup				{ BEGIN; }
+step "s2_begin"		{ BEGIN; }
 step "s2_key_share"	{ SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; }
 step "s2_commit"	{ COMMIT; }
+step "s2_vacuum"	{ VACUUM FREEZE tab_freeze; }
+
+session "s3"
+step "s3_begin"		{ BEGIN; }
+step "s3_key_share"	{ SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; }
+step "s3_commit"	{ COMMIT; }
+step "s3_vacuum"	{ VACUUM FREEZE tab_freeze; }
+
+# This permutation verfies that a previous bug
+#     https://postgr.es/m/e5711e62-8fdf-4dca-a888-c200bf6b5...@amazon.com
+#     https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
+# is not reintroduced. We used to make wrong pruning / freezing
+# decision for multixacts, which could lead to a) broken hot chains b)
+# dead rows being revived.
+permutation "s1_begin" "s2_begin" "s3_begin" # start transactions
+   "s1_update" "s2_key_share" "s3_key_share" # have xmax be a multi with an updater, updater being oldest xid
+   "s1_update" # create additional row version that has multis
+   "s1_commit" "s2_commit" # commit both updater and share locker
+   "s2_vacuum" # due to bug in freezing logic, we used to *not* prune updated row, and then froze it
+   "s1_selectone" # if hot chain is broken, the row can't be found via index scan
+   "s3_commit" # commit remaining open xact
+   "s2_vacuum" # pruning / freezing in broken hot chains would unset xmax, reviving rows
+   "s1_selectall" # show borkedness
-- 
2.14.1.536.g6867272d5b.dirty

>From 3f432d15aa9550112fff18a6670e4b75119ddbb6 Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Mon, 13 Nov 2017 18:45:47 -0800
Subject: [PATCH 2/2] Perform a lot more sanity checks when freezing tuples.

The previous commit has shown that the sanity checks around freezing
aren't strong enough. Strengthening them seems especially important
because the existance of the bug has caused corruption that we don't
want to make even worse during future vacuum cycles.

Author: Andres Freund
Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpj...@alap3.anarazel.de
Backpatch: 9.3-
---
 src/backend/access/heap/heapam.c      | 74 ++++++++++++++++++++++++++++-------
 src/backend/access/heap/rewriteheap.c |  5 ++-
 src/backend/commands/vacuumlazy.c     | 15 ++++++-
 src/include/access/heapam.h           |  5 ++-
 src/include/access/heapam_xlog.h      |  2 +
 5 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3acef279f47..eb93718baa7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6357,6 +6357,7 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
  */
 static TransactionId
 FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
+				  TransactionId relfrozenxid, TransactionId relminmxid,
 				  TransactionId cutoff_xid, MultiXactId cutoff_multi,
 				  uint16 *flags)
 {
@@ -6385,14 +6386,19 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 	}
 	else if (MultiXactIdPrecedes(multi, cutoff_multi))
 	{
+		if (MultiXactIdPrecedes(multi, relminmxid))
+			elog(ERROR, "encountered multixact from before horizon");
+
 		/*
 		 * This old multi cannot possibly have members still running.  If it
 		 * was a locker only, it can be removed without any further
 		 * consideration; but if it contained an update, we might need to
 		 * preserve it.
 		 */
-		Assert(!MultiXactIdIsRunning(multi,
-									 HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)));
+		if (MultiXactIdIsRunning(multi,
+								 HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)))
+			elog(ERROR, "multixact from before cutoff found to be still running");
+
 		if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
 		{
 			*flags |= FRM_INVALIDATE_XMAX;
@@ -6412,7 +6418,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 			 */
 			if (TransactionIdPrecedes(xid, cutoff_xid))
 			{
-				Assert(!TransactionIdDidCommit(xid));
+				if (TransactionIdPrecedes(xid, relfrozenxid))
+					elog(ERROR, "encountered xid from before horizon");
+				if (TransactionIdDidCommit(xid))
+					elog(ERROR, "can't freeze committed xmax");
 				*flags |= FRM_INVALIDATE_XMAX;
 				xid = InvalidTransactionId; /* not strictly necessary */
 			}
@@ -6484,6 +6493,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 		{
 			TransactionId xid = members[i].xid;
 
+			Assert(TransactionIdIsValid(xid));
+			if (TransactionIdPrecedes(xid, relfrozenxid))
+				elog(ERROR, "encountered xid from before horizon");
+
 			/*
 			 * It's an update; should we keep it?  If the transaction is known
 			 * aborted or crashed then it's okay to ignore it, otherwise not.
@@ -6512,18 +6525,24 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
 				update_committed = true;
 				update_xid = xid;
 			}
+			else
+			{
+				/*
+				 * Not in progress, not committed -- must be aborted or crashed;
+				 * we can ignore it.
+				 */
+			}
 
-			/*
-			 * Not in progress, not committed -- must be aborted or crashed;
-			 * we can ignore it.
-			 */
 
 			/*
 			 * Since the tuple wasn't marked HEAPTUPLE_DEAD by vacuum, the
-			 * update Xid cannot possibly be older than the xid cutoff.
+			 * update Xid cannot possibly be older than the xid cutoff. The
+			 * presence of such a tuple would cause corruption, so be paranoid
+			 * and check.
 			 */
-			Assert(!TransactionIdIsValid(update_xid) ||
-				   !TransactionIdPrecedes(update_xid, cutoff_xid));
+			if (TransactionIdIsValid(update_xid) &&
+				TransactionIdPrecedes(update_xid, cutoff_xid))
+				elog(ERROR, "encountered xid from before xid cutoff");
 
 			/*
 			 * If we determined that it's an Xid corresponding to an update
@@ -6620,8 +6639,9 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
  * recovery.  We really need to remove old xids.
  */
 bool
-heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
-						  TransactionId cutoff_multi,
+heap_prepare_freeze_tuple(HeapTupleHeader tuple,
+						  TransactionId relfrozenxid, TransactionId relminmxid,
+						  TransactionId cutoff_xid, TransactionId cutoff_multi,
 						  xl_heap_freeze_tuple *frz, bool *totally_frozen_p)
 {
 	bool		changed = false;
@@ -6640,6 +6660,12 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	{
 		if (TransactionIdPrecedes(xid, cutoff_xid))
 		{
+			if (TransactionIdPrecedes(xid, relfrozenxid))
+				elog(ERROR, "encountered xid from before xid cutoff");
+
+			if (!TransactionIdDidCommit(xid))
+				elog(ERROR, "uncommitted xmin needs to be frozen");
+
 			frz->t_infomask |= HEAP_XMIN_FROZEN;
 			changed = true;
 		}
@@ -6664,6 +6690,7 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 		uint16		flags;
 
 		newxmax = FreezeMultiXactId(xid, tuple->t_infomask,
+									relfrozenxid, relminmxid,
 									cutoff_xid, cutoff_multi, &flags);
 
 		if (flags & FRM_INVALIDATE_XMAX)
@@ -6714,7 +6741,21 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
 	else if (TransactionIdIsNormal(xid))
 	{
 		if (TransactionIdPrecedes(xid, cutoff_xid))
+		{
+			if (TransactionIdPrecedes(xid, relfrozenxid))
+				elog(ERROR, "encountered xid from before xid cutoff");
+
+			/*
+			 * If we freeze xmax, make absolutely sure that it's not an XID
+			 * that is important (Note, a lock-only xmax can be removed
+			 * independent of committedness, since a committed lock holder has
+			 * released the lock).
+			 */
+			if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+				TransactionIdDidCommit(xid))
+				elog(ERROR, "can't freeze committed xmax");
 			freeze_xmax = true;
+		}
 		else
 			totally_frozen = false;
 	}
@@ -6819,14 +6860,17 @@ heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
  * Useful for callers like CLUSTER that perform their own WAL logging.
  */
 bool
-heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
-				  TransactionId cutoff_multi)
+heap_freeze_tuple(HeapTupleHeader tuple,
+				  TransactionId relfrozenxid, TransactionId relminmxid,
+				  TransactionId cutoff_xid, TransactionId cutoff_multi)
 {
 	xl_heap_freeze_tuple frz;
 	bool		do_freeze;
 	bool		tuple_totally_frozen;
 
-	do_freeze = heap_prepare_freeze_tuple(tuple, cutoff_xid, cutoff_multi,
+	do_freeze = heap_prepare_freeze_tuple(tuple,
+										  relfrozenxid, relminmxid,
+										  cutoff_xid, cutoff_multi,
 										  &frz, &tuple_totally_frozen);
 
 	/*
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f93c194e182..7d163c91379 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
 	 * While we have our hands on the tuple, we may as well freeze any
 	 * eligible xmin or xmax, so that future VACUUM effort can be saved.
 	 */
-	heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
+	heap_freeze_tuple(new_tuple->t_data,
+					  state->rs_old_rel->rd_rel->relfrozenxid,
+					  state->rs_old_rel->rd_rel->relminmxid,
+					  state->rs_freeze_xid,
 					  state->rs_cutoff_multi);
 
 	/*
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 6587db77acf..4a01137527c 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -467,6 +467,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 				blkno;
 	HeapTupleData tuple;
 	char	   *relname;
+	TransactionId relfrozenxid = onerel->rd_rel->relfrozenxid;
+	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages;
 	double		num_tuples,
@@ -1004,6 +1006,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 					 * tuple, we choose to keep it, because it'll be a lot
 					 * cheaper to get rid of it in the next pruning pass than
 					 * to treat it like an indexed tuple.
+					 *
+					 * If this were to happen for a tuple that actually needed
+					 * to be deleted , we'd be in trouble, because it'd
+					 * possibly leave a tuple below the relation's xmin
+					 * horizon alive. But checks in
+					 * heap_prepare_freeze_tuple() would trigger in that case,
+					 * preventing corruption.
 					 */
 					if (HeapTupleIsHotUpdated(&tuple) ||
 						HeapTupleIsHeapOnly(&tuple))
@@ -1095,8 +1104,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 				 * Each non-removable tuple must be checked to see if it needs
 				 * freezing.  Note we already have exclusive buffer lock.
 				 */
-				if (heap_prepare_freeze_tuple(tuple.t_data, FreezeLimit,
-											  MultiXactCutoff, &frozen[nfrozen],
+				if (heap_prepare_freeze_tuple(tuple.t_data,
+											  relfrozenxid, relminmxid,
+											  FreezeLimit, MultiXactCutoff,
+											  &frozen[nfrozen],
 											  &tuple_totally_frozen))
 					frozen[nfrozen++].offset = offnum;
 
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41024e926..f1366ed9581 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -168,8 +168,9 @@ extern HTSU_Result heap_lock_tuple(Relation relation, HeapTuple tuple,
 				bool follow_update,
 				Buffer *buffer, HeapUpdateFailureData *hufd);
 extern void heap_inplace_update(Relation relation, HeapTuple tuple);
-extern bool heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
-				  TransactionId cutoff_multi);
+extern bool heap_freeze_tuple(HeapTupleHeader tuple,
+				  TransactionId relfrozenxid, TransactionId relminmxid,
+				  TransactionId cutoff_xid, TransactionId cutoff_multi);
 extern bool heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
 						MultiXactId cutoff_multi, Buffer buf);
 extern bool heap_tuple_needs_eventual_freeze(HeapTupleHeader tuple);
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 81a6a395c4f..ad5f982978c 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -390,6 +390,8 @@ extern XLogRecPtr log_heap_freeze(Relation reln, Buffer buffer,
 				TransactionId cutoff_xid, xl_heap_freeze_tuple *tuples,
 				int ntuples);
 extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
+						  TransactionId relfrozenxid,
+						  TransactionId relminmxid,
 						  TransactionId cutoff_xid,
 						  TransactionId cutoff_multi,
 						  xl_heap_freeze_tuple *frz,
-- 
2.14.1.536.g6867272d5b.dirty

Reply via email to