Noah, Robert, All

On 2015-12-11 11:20:21 -0500, Robert Haas wrote:
> On Thu, Dec 10, 2015 at 9:32 AM, Robert Haas <robertmh...@gmail.com> wrote:
> >> Adding a 'prevOffsetStopLimit' and using it seems like a ~5 line patch.
> >
> > So let's do that, then.
> 
> Who is going to take care of this?

Here's two patches:

1) The fix to SetOffsetVacuumLimit().

   I've tested this by introducing a probabilistic "return false;" to
   find_multixact_start(), and torturing postgres by burning through
   billions of multixactids of various sizes.  Behaves about as
   bad^Wgood as without the induced failures; before the patch there
   were moments of spurious warnings/errors when ->offsetStopLimit was
   out of whack.

2) A subset of the comment changes from Noah's repository. Some of the
   comment changes didn't make sense without the removal
   SlruScanDirCbFindEarliest(), a small number of other changes I couldn't
   fully agree with.

   Noah, are you ok with pushing that subset of your changes? Is
   "Slightly edited subset of a larger patchset by Noah Misch" an OK
   attribution?


Noah, on a first glance I think e50cca0ae ("Remove the
SlruScanDirCbFindEarliest() test from TruncateMultiXact().") is a good
idea. So I do encourage you to submit that as a separate patch.

Regards,

Andres
>From d26d41a96a7fc8dbe9827d55837875790b21ca1b Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 12 Dec 2015 17:57:21 +0100
Subject: [PATCH 1/2] Fix bug in SetOffsetVacuumLimit() triggered by
 find_multixact_start() failure.

In case find_multixact_start() failed SetOffsetVacuumLimit() installed 0
into MultiXactState->offsetStopLimit. Unlike oldestOffset the
to-be-installed value was not restored in the error branch.

Luckily there are no known cases where find_multixact_start() will
return an error in 9.5 and above.

But if the bug is triggered, e.g. due to filesystem permission issues,
it'd be somewhat bad: GetNewMultiXactId() could continue allocating
mxids even if close to a wraparound, or it could erroneously stop
allocating mxids, even if no wraparound is looming.  Luckily the wrong
value would be corrected the next time SetOffsetVacuumLimit() is called,
or by a restart.

Reported-By: Noah Misch, although this is not his preferred fix
Backpatch: 9.5, where the bug was introduced as part of 4f627f
---
 src/backend/access/transam/multixact.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index b66a2b6..d2619bd 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -2552,6 +2552,7 @@ SetOffsetVacuumLimit(void)
 	bool		oldestOffsetKnown = false;
 	bool		prevOldestOffsetKnown;
 	MultiXactOffset offsetStopLimit = 0;
+	MultiXactOffset prevOffsetStopLimit;
 
 	/*
 	 * NB: Have to prevent concurrent truncation, we might otherwise try to
@@ -2566,6 +2567,7 @@ SetOffsetVacuumLimit(void)
 	nextOffset = MultiXactState->nextOffset;
 	prevOldestOffsetKnown = MultiXactState->oldestOffsetKnown;
 	prevOldestOffset = MultiXactState->oldestOffset;
+	prevOffsetStopLimit = MultiXactState->offsetStopLimit;
 	Assert(MultiXactState->finishedStartup);
 	LWLockRelease(MultiXactGenLock);
 
@@ -2633,11 +2635,13 @@ SetOffsetVacuumLimit(void)
 	{
 		/*
 		 * If we failed to get the oldest offset this time, but we have a
-		 * value from a previous pass through this function, use the old value
-		 * rather than automatically forcing it.
+		 * value from a previous pass through this function, use the old
+		 * values rather than automatically forcing an emergency autovacuum
+		 * cycle again.
 		 */
 		oldestOffset = prevOldestOffset;
 		oldestOffsetKnown = true;
+		offsetStopLimit = prevOffsetStopLimit;
 	}
 
 	/* Install the computed values */
-- 
2.6.0.rc3

>From addafc27e7b89187a3f55a2cbce136d58722c97b Mon Sep 17 00:00:00 2001
From: Andres Freund <and...@anarazel.de>
Date: Sat, 12 Dec 2015 17:53:41 +0100
Subject: [PATCH 2/2] Improve/Fix comments around multixact truncation.

My commits 4f627f8 ("Rework the way multixact truncations work.") and
aa29c1c ("Remove legacy multixact truncation support.") missed updating
a number of comments. Fix that. Additionally improve accuracy of a few
of the added comments.

Reported-By: Noah Misch
Author: Slightly edited subset of a larger patchset by Noah Misch
Backpatch: 9.5, which is the first branch to contain the above commits
---
 src/backend/access/heap/README.tuplock |  6 ++----
 src/backend/access/transam/multixact.c | 36 +++++++++++++++++++---------------
 src/backend/access/transam/xlog.c      |  4 ----
 3 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/heap/README.tuplock b/src/backend/access/heap/README.tuplock
index 10b8d78..f845958 100644
--- a/src/backend/access/heap/README.tuplock
+++ b/src/backend/access/heap/README.tuplock
@@ -102,10 +102,8 @@ the MultiXacts in them are no longer of interest.
 VACUUM is in charge of removing old MultiXacts at the time of tuple freezing.
 The lower bound used by vacuum (that is, the value below which all multixacts
 are removed) is stored as pg_class.relminmxid for each table; the minimum of
-all such values is stored in pg_database.datminmxid.  The minimum across
-all databases, in turn, is recorded in checkpoint records, and CHECKPOINT
-removes pg_multixact/ segments older than that value once the checkpoint
-record has been flushed.
+all such values is stored in pg_database.datminmxid.  VACUUM removes
+pg_multixact/ segments older than the minimum datminmxid across databases.
 
 Infomask Bits
 -------------
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index d2619bd..fdeb90b 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -257,7 +257,7 @@ typedef struct MultiXactStateData
 	 * we compute it (using nextMXact if none are valid).  Each backend is
 	 * required not to attempt to access any SLRU data for MultiXactIds older
 	 * than its own OldestVisibleMXactId[] setting; this is necessary because
-	 * the checkpointer could truncate away such data at any instant.
+	 * VACUUM could truncate away such data at any instant.
 	 *
 	 * The oldest valid value among all of the OldestMemberMXactId[] and
 	 * OldestVisibleMXactId[] entries is considered by vacuum as the earliest
@@ -271,8 +271,8 @@ typedef struct MultiXactStateData
 	 * the freezing point so computed is used as the new pg_class.relminmxid
 	 * value.  The minimum of all those values in a database is stored as
 	 * pg_database.datminmxid.  In turn, the minimum of all of those values is
-	 * stored in pg_control and used as truncation point for pg_multixact.  At
-	 * checkpoint or restartpoint, unneeded segments are removed.
+	 * stored in pg_control and used as truncation point for pg_multixact.
+	 * VACUUM removes unneeded segments.
 	 */
 	MultiXactId perBackendXactIds[FLEXIBLE_ARRAY_MEMBER];
 } MultiXactStateData;
@@ -662,8 +662,8 @@ MultiXactIdSetOldestMember(void)
  *
  * We set the OldestVisibleMXactId for a given transaction the first time
  * it's going to inspect any MultiXactId.  Once we have set this, we are
- * guaranteed that the checkpointer won't truncate off SLRU data for
- * MultiXactIds at or after our OldestVisibleMXactId.
+ * guaranteed that VACUUM won't truncate off SLRU data for MultiXactIds at
+ * or after our OldestVisibleMXactId.
  *
  * The value to set is the oldest of nextMXact and all the valid per-backend
  * OldestMemberMXactId[] entries.  Because of the locking we do, we can be
@@ -2207,9 +2207,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
 	/*
 	 * We pretend that a wrap will happen halfway through the multixact ID
 	 * space, but that's not really true, because multixacts wrap differently
-	 * from transaction IDs.  Note that, separately from any concern about
-	 * multixact IDs wrapping, we must ensure that multixact members do not
-	 * wrap.  Limits for that are set in DetermineSafeOldestOffset, not here.
+	 * from transaction IDs.
 	 */
 	multiWrapLimit = oldest_datminmxid + (MaxMultiXactId >> 1);
 	if (multiWrapLimit < FirstMultiXactId)
@@ -2253,7 +2251,7 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
 	if (multiVacLimit < FirstMultiXactId)
 		multiVacLimit += FirstMultiXactId;
 
-	/* Grab lock for just long enough to set the new limit values */
+	/* Grab lock for just long enough to set the new MultiXactId bounds */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->oldestMultiXactId = oldest_datminmxid;
 	MultiXactState->oldestMultiXactDB = oldest_datoid;
@@ -2270,11 +2268,11 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid)
 			 multiWrapLimit, oldest_datoid)));
 
 	/*
-	 * Computing the actual limits is only possible once the data directory is
-	 * in a consistent state. There's no need to compute the limits while
-	 * still replaying WAL - no decisions about new multis are made even
-	 * though multixact creations might be replayed. So we'll only do further
-	 * checks after TrimMultiXact() has been called.
+	 * Setting MultiXactState->oldestOffset (in SetOffsetVacuumLimit())
+	 * entails a find_multixact_start() call, which is only possible once the
+	 * data directory is in a consistent state. There's no need for an offset
+	 * limit while still replaying WAL; no decisions about new multis are made
+	 * even though multixact creations might be replayed.
 	 */
 	if (!MultiXactState->finishedStartup)
 		return;
@@ -3069,8 +3067,8 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB)
 	 * the critical section: Have to do it before truncation, to prevent
 	 * concurrent lookups of those values. Has to be inside the critical
 	 * section as otherwise a future call to this function would error out,
-	 * while looking up the oldest member in offsets, if our caller crashes
-	 * before updating the limits.
+	 * while looking up the oldest member in offsets, given ereport(ERROR)
+	 * before our caller updates the limits.
 	 */
 	LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
 	MultiXactState->oldestMultiXactId = newOldestMulti;
@@ -3185,6 +3183,12 @@ WriteMZeroPageXlogRec(int pageno, uint8 info)
 /*
  * Write a TRUNCATE xlog record
  *
+ * Flushing the record to disk serves two critical roles.  Like it is with
+ * TruncateCLOG(), recent HEAP_FREEZE records must reach disk before we delete
+ * any MultiXactId they overwrote.  This flush before unlinking any segment
+ * ensures that a crash/basebackup and recovery will not make
+ * MultiXactState->oldestMultiXactId point to a missing segment.
+
  * We must flush the xlog record to disk before returning --- see notes in
  * TruncateCLOG().
  */
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 71fc8ff..35d76de 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9319,10 +9319,6 @@ xlog_redo(XLogReaderState *record)
 		MultiXactAdvanceNextMXact(checkPoint.nextMulti,
 								  checkPoint.nextMultiOffset);
 
-		/*
-		 * NB: This may perform multixact truncation when replaying WAL
-		 * generated by an older primary.
-		 */
 		MultiXactAdvanceOldest(checkPoint.oldestMulti,
 							   checkPoint.oldestMultiDB);
 		if (TransactionIdPrecedes(ShmemVariableCache->oldestXid,
-- 
2.6.0.rc3

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to