On 26/02/17 01:43, Petr Jelinek wrote:
> On 24/02/17 22:56, Petr Jelinek wrote:
>> On 22/02/17 03:05, Petr Jelinek wrote:
>>> On 13/12/16 00:38, Petr Jelinek wrote:
>>>> On 12/12/16 23:33, Andres Freund wrote:
>>>>> On 2016-12-12 23:27:30 +0100, Petr Jelinek wrote:
>>>>>> On 12/12/16 22:42, Andres Freund wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 2016-12-10 23:10:19 +0100, Petr Jelinek wrote:
>>>>>>>> Hi,
>>>>>>>> First one is outright bug, which has to do with how we track running
>>>>>>>> transactions. What snapbuild basically does while doing initial 
>>>>>>>> snapshot
>>>>>>>> is read the xl_running_xacts record, store the list of running txes and
>>>>>>>> then wait until they all finish. The problem with this is that
>>>>>>>> xl_running_xacts does not ensure that it only logs transactions that 
>>>>>>>> are
>>>>>>>> actually still running (to avoid locking PGPROC) so there might be xids
>>>>>>>> in xl_running_xacts that already committed before it was logged.
>>>>>>>
>>>>>>> I don't think that's actually true?  Notice how LogStandbySnapshot()
>>>>>>> only releases the lock *after* the LogCurrentRunningXacts() iff
>>>>>>> wal_level >= WAL_LEVEL_LOGICAL.  So the explanation for the problem you
>>>>>>> observed must actually be a bit more complex :(
>>>>>>>
>>>>>>
>>>>>> Hmm, interesting, I did see the transaction commit in the WAL before the
>>>>>> xl_running_xacts that contained the xid as running. I only seen it on
>>>>>> production system though, didn't really manage to easily reproduce it
>>>>>> locally.
>>>>>
>>>>> I suspect the reason for that is that RecordTransactionCommit() doesn't
>>>>> conflict with ProcArrayLock in the first place - only
>>>>> ProcArrayEndTransaction() does.  So they're still running in the PGPROC
>>>>> sense, just not the crash-recovery sense...
>>>>>
>>>>
>>>> That looks like reasonable explanation. BTW I realized my patch needs
>>>> bit more work, currently it will break the actual snapshot as it behaves
>>>> same as if the xl_running_xacts was empty which is not correct AFAICS.
>>>>
>>>
>>> Hi,
>>>
>>> I got to work on this again. Unfortunately I haven't found solution that
>>> I would be very happy with. What I did is in case we read
>>> xl_running_xacts which has all transactions we track finished, we start
>>> tracking from that new xl_running_xacts again with the difference that
>>> we clean up the running transactions based on previously seen committed
>>> ones. That means that on busy server we may wait for multiple
>>> xl_running_xacts rather than just one, but at least we have chance to
>>> finish unlike with current coding which basically waits for empty
>>> xl_running_xacts. I also removed the additional locking for logical
>>> wal_level in LogStandbySnapshot() since it does not work.
>>
>> Not hearing any opposition to this idea so I decided to polish this and
>> also optimize it a bit.
>>
>> That being said, thanks to testing from Erik Rijkers I've identified one
>> more bug in how we do the initial snapshot. Apparently we don't reserve
>> the global xmin when we start building the initial exported snapshot for
>> a slot (we only reserver catalog_xmin which is fine for logical decoding
>> but not for the exported snapshot) so the VACUUM and heap pruning will
>> happily delete old versions of rows that are still needed by anybody
>> trying to use that exported snapshot.
>>
> 
> Aaand I found one more bug in snapbuild. Apparently we don't protect the
> snapshot builder xmin from going backwards which can yet again result in
> corrupted exported snapshot.
> 
> Summary of attached patches:
> 0001 - Fixes the above mentioned global xmin tracking issues. Needs to
> be backported all the way to 9.4
> 
> 0002 - Removes use of the logical decoding saved snapshots for initial
> exported snapshot since those snapshots only work for catalogs and not
> user data. Also needs to be backported all the way to 9.4.

I've been bit overzealous about this one (removed the use of the saved
snapshots completely). So here is a fix for that (and rebase on top of
current HEAD).

> 
> 0003 - Makes sure snapshot builder xmin is not moved backwards by
> xl_running_xacts (which can otherwise happen during initial snapshot
> building). Also should be backported to 9.4.
> 
> 0004 - Changes handling of the xl_running_xacts in initial snapshot
> build to what I wrote above and removes the extra locking from
> LogStandbySnapshot introduced by logical decoding.
> 
> 0005 - Improves performance of initial snapshot building by skipping
> catalog snapshot build for transactions that don't do catalog changes.
> 


-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Fri, 24 Feb 2017 21:39:03 +0100
Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export

Otherwise the VACUUM or pruning might remove tuples still needed by the
exported snapshot.
---
 src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/logical.c b/src/backend/replication/logical/logical.c
index 5529ac8..57c392c 100644
--- a/src/backend/replication/logical/logical.c
+++ b/src/backend/replication/logical/logical.c
@@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
 	 * the slot machinery about the new limit. Once that's done the
 	 * ProcArrayLock can be released as the slot machinery now is
 	 * protecting against vacuum.
+	 *
+	 * Note that we only store the global xmin temporarily in the in-memory
+	 * state so that the initial snapshot can be exported. After initial
+	 * snapshot is done global xmin should be reset and not tracked anymore
+	 * so we are fine with losing the global xmin after crash.
 	 * ----
 	 */
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 
 	slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
 	slot->data.catalog_xmin = slot->effective_catalog_xmin;
+	slot->effective_xmin = slot->effective_catalog_xmin;
 
 	ReplicationSlotsComputeRequiredXmin(true);
 
@@ -282,7 +288,7 @@ CreateInitDecodingContext(char *plugin,
 	 * tell the snapshot builder to only assemble snapshot once reaching the
 	 * running_xact's record with the respective xmin.
 	 */
-	xmin_horizon = slot->data.catalog_xmin;
+	xmin_horizon = slot->effective_xmin;
 
 	ReplicationSlotMarkDirty();
 	ReplicationSlotSave();
@@ -456,9 +462,22 @@ DecodingContextFindStartpoint(LogicalDecodingContext *ctx)
 void
 FreeDecodingContext(LogicalDecodingContext *ctx)
 {
+	ReplicationSlot *slot = MyReplicationSlot;
+
 	if (ctx->callbacks.shutdown_cb != NULL)
 		shutdown_cb_wrapper(ctx);
 
+	/*
+	 * Cleanup global xmin for the slot that we may have set in
+	 * CreateInitDecodingContext(). We do not take ProcArrayLock or similar
+	 * since we only reset xmin here and there's not much harm done by a
+	 * concurrent computation missing that.
+	 */
+	SpinLockAcquire(&slot->mutex);
+	slot->effective_xmin = InvalidTransactionId;
+	SpinLockRelease(&slot->mutex);
+	ReplicationSlotsComputeRequiredXmin(false);
+
 	ReorderBufferFree(ctx->reorder);
 	FreeSnapshotBuilder(ctx->snapshot_builder);
 	XLogReaderFree(ctx->reader);
-- 
2.7.4

>From b171eb533f4dd14f9f5082b469e5218c1bf13682 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 21 Feb 2017 20:14:44 +0100
Subject: [PATCH 2/5] Don't use on disk snapshots for snapshot export in
 logical decoding

We store historical snapshots on disk to enable continuation of logical
decoding after restart. These snapshots were also used bu slot
initialiation code for initial snapshot that the slot exports to aid
synchronization of data copy and the stream consumption. However
these snapshots are only useful for catalogs and not for normal user
tables. So when we exported such snapshots for user to read data from
tables that is consistent with a specific LSN of slot creation, user
would instead read wrong data.

This patch changes the code so that stored snapshots are only used for
logical decoding restart but not for initial slot snapshot.
---
 src/backend/replication/logical/snapbuild.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 6f19cdc..4b0c1e0 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1214,7 +1214,7 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 *
 	 * a) There were no running transactions when the xl_running_xacts record
 	 *	  was inserted, jump to CONSISTENT immediately. We might find such a
-	 *	  state we were waiting for b) and c).
+	 *	  state we were waiting for b) or c).
 	 *
 	 * b) Wait for all toplevel transactions that were running to end. We
 	 *	  simply track the number of in-progress toplevel transactions and
@@ -1229,7 +1229,9 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 *	  at all.
 	 *
 	 * c) This (in a previous run) or another decoding slot serialized a
-	 *	  snapshot to disk that we can use.
+	 *	  snapshot to disk that we can use. We can't use this method for the
+	 *	  initial snapshot when slot is being created as that snapshot may be
+	 *	  exported and used for reading user data.
 	 * ---
 	 */
 
@@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 
 		return false;
 	}
-	/* c) valid on disk state */
-	else if (SnapBuildRestore(builder, lsn))
+	/* c) valid on disk state and not exported snapshot */
+	else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
+			 SnapBuildRestore(builder, lsn))
 	{
 		/* there won't be any state to cleanup */
 		return false;
 	}
-
 	/*
 	 * b) first encounter of a useable xl_running_xacts record. If we had
 	 * found one earlier we would either track running transactions (i.e.
-- 
2.7.4

>From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Sun, 26 Feb 2017 01:07:33 +0100
Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards

---
 src/backend/replication/logical/snapbuild.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 4b0c1e0..49c4337 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1141,7 +1141,8 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
 	 * looking, it's correct and actually more efficient this way since we hit
 	 * fast paths in tqual.c.
 	 */
-	builder->xmin = running->oldestRunningXid;
+	if (TransactionIdFollowsOrEquals(running->oldestRunningXid, builder->xmin))
+		builder->xmin = running->oldestRunningXid;
 
 	/* Remove transactions we don't need to keep track off anymore */
 	SnapBuildPurgeCommittedTxn(builder);
-- 
2.7.4

>From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Wed, 22 Feb 2017 00:57:33 +0100
Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder

Due to race condition, the xl_running_xacts might contain no longer
running transactions. Previous coding tried to get around this by
additional locking but that did not work correctly for committs. Instead
try combining decoded commits and multiple xl_running_xacts to get the
consistent snapshot.

This also reverts changes made to GetRunningTransactionData() and
LogStandbySnapshot() by b89e151 as the additional locking does not help.
---
 src/backend/replication/logical/snapbuild.c | 71 ++++++++++++++++++++++++-----
 src/backend/storage/ipc/procarray.c         |  5 +-
 src/backend/storage/ipc/standby.c           | 19 --------
 3 files changed, 63 insertions(+), 32 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 49c4337..1a1c9ba 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 *	  simply track the number of in-progress toplevel transactions and
 	 *	  lower it whenever one commits or aborts. When that number
 	 *	  (builder->running.xcnt) reaches zero, we can go from FULL_SNAPSHOT
-	 *	  to CONSISTENT.
+	 *	  to CONSISTENT. Sometimes we might get xl_running_xacts which has
+	 *	  all tracked transactions as finished. We'll need to restart tracking
+	 *	  in that case and use previously collected committed transactions to
+	 *	  purge transactions mistakenly marked as running in the
+	 *	  xl_running_xacts which exist as a result of race condition in
+	 *	  LogStandbySnapshot().
 	 *	  NB: We need to search running.xip when seeing a transaction's end to
 	 *	  make sure it's a toplevel transaction and it's been one of the
 	 *	  initially running ones.
@@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 	 * b) first encounter of a useable xl_running_xacts record. If we had
 	 * found one earlier we would either track running transactions (i.e.
 	 * builder->running.xcnt != 0) or be consistent (this function wouldn't
-	 * get called).
+	 * get called). However it's possible that we could not see all
+	 * transactions that were marked as running in xl_running_xacts, so if
+	 * we get new one that says all were closed but we are not consistent
+	 * yet, we need to restart the tracking while taking previously seen
+	 * transactions into account.
 	 */
-	else if (!builder->running.xcnt)
+	else if (!builder->running.xcnt ||
+			 running->oldestRunningXid > builder->running.xmax)
 	{
 		int			off;
+		bool		first = builder->running.xcnt == 0;
 
 		/*
 		 * We only care about toplevel xids as those are the ones we
@@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 		builder->running.xmin = builder->running.xip[0];
 		builder->running.xmax = builder->running.xip[running->xcnt - 1];
 
+
 		/* makes comparisons cheaper later */
 		TransactionIdRetreat(builder->running.xmin);
 		TransactionIdAdvance(builder->running.xmax);
 
 		builder->state = SNAPBUILD_FULL_SNAPSHOT;
 
-		ereport(LOG,
-			(errmsg("logical decoding found initial starting point at %X/%X",
-					(uint32) (lsn >> 32), (uint32) lsn),
-			 errdetail_plural("%u transaction needs to finish.",
-							  "%u transactions need to finish.",
-							  builder->running.xcnt,
-							  (uint32) builder->running.xcnt)));
-
 		/*
 		 * Iterate through all xids, wait for them to finish.
 		 *
@@ -1371,9 +1375,54 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn
 			if (TransactionIdIsCurrentTransactionId(xid))
 				elog(ERROR, "waiting for ourselves");
 
+			/*
+			 * This isn't required for the correctness of decoding, but to allow
+			 * isolationtester to notice that we're currently waiting for
+			 * something.
+			 */
 			XactLockTableWait(xid, NULL, NULL, XLTW_None);
 		}
 
+		/*
+		 * If this is the first time we've seen xl_running_xacts, we are done.
+		 */
+		if (first)
+		{
+			ereport(LOG,
+				(errmsg("logical decoding found initial starting point at %X/%X",
+						(uint32) (lsn >> 32), (uint32) lsn),
+				 errdetail_plural("%u transaction needs to finish.",
+								  "%u transactions need to finish.",
+								  builder->running.xcnt,
+								  (uint32) builder->running.xcnt)));
+		}
+		else
+		{
+			/*
+			 * Because of the race condition in LogStandbySnapshot() the
+			 * transactions recorded in xl_running_xacts as running might have
+			 * already committed by the time the xl_running_xacts was written
+			 * to WAL. Use the information about decoded transactions that we
+			 * gathered so far to update our idea about what's still running.
+			 *
+			 * We can use SnapBuildEndTxn directly as it only does the
+			 * transaction running check and handling without any additional
+			 * side effects.
+			 */
+			for (off = 0; off < builder->committed.xcnt; off++)
+				SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);
+			if (builder->state == SNAPBUILD_CONSISTENT)
+				return false;
+
+			ereport(LOG,
+				(errmsg("logical decoding moved initial starting point to %X/%X",
+						(uint32) (lsn >> 32), (uint32) lsn),
+				 errdetail_plural("%u transaction needs to finish.",
+								  "%u transactions need to finish.",
+								  builder->running.xcnt,
+								  (uint32) builder->running.xcnt)));
+		}
+
 		/* nothing could have built up so far, so don't perform cleanup */
 		return false;
 	}
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index cd14667..4ea81f8 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2055,12 +2055,13 @@ GetRunningTransactionData(void)
 	CurrentRunningXacts->oldestRunningXid = oldestRunningXid;
 	CurrentRunningXacts->latestCompletedXid = latestCompletedXid;
 
+	/* We don't release XidGenLock here, the caller is responsible for that */
+	LWLockRelease(ProcArrayLock);
+
 	Assert(TransactionIdIsValid(CurrentRunningXacts->nextXid));
 	Assert(TransactionIdIsValid(CurrentRunningXacts->oldestRunningXid));
 	Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));
 
-	/* We don't release the locks here, the caller is responsible for that */
-
 	return CurrentRunningXacts;
 }
 
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 6259070..f461f21 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -933,27 +933,8 @@ LogStandbySnapshot(void)
 	 */
 	running = GetRunningTransactionData();
 
-	/*
-	 * GetRunningTransactionData() acquired ProcArrayLock, we must release it.
-	 * For Hot Standby this can be done before inserting the WAL record
-	 * because ProcArrayApplyRecoveryInfo() rechecks the commit status using
-	 * the clog. For logical decoding, though, the lock can't be released
-	 * early because the clog might be "in the future" from the POV of the
-	 * historic snapshot. This would allow for situations where we're waiting
-	 * for the end of a transaction listed in the xl_running_xacts record
-	 * which, according to the WAL, has committed before the xl_running_xacts
-	 * record. Fortunately this routine isn't executed frequently, and it's
-	 * only a shared lock.
-	 */
-	if (wal_level < WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	recptr = LogCurrentRunningXacts(running);
 
-	/* Release lock if we kept it longer ... */
-	if (wal_level >= WAL_LEVEL_LOGICAL)
-		LWLockRelease(ProcArrayLock);
-
 	/* GetRunningTransactionData() acquired XidGenLock, we must release it */
 	LWLockRelease(XidGenLock);
 
-- 
2.7.4

>From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001
From: Petr Jelinek <pjmo...@pjmodos.net>
Date: Tue, 21 Feb 2017 19:58:18 +0100
Subject: [PATCH 5/5] Skip unnecessary snapshot builds

When doing initial snapshot build during logical decoding
initialization, don't build snapshots for transactions where we know the
transaction didn't do any catalog changes. Otherwise we might end up
with thousands of useless snapshots on busy server which can be quite
expensive.
---
 src/backend/replication/logical/snapbuild.c | 82 +++++++++++++++++++----------
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 1a1c9ba..c800aa5 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	bool		forced_timetravel = false;
 	bool		sub_needs_timetravel = false;
 	bool		top_needs_timetravel = false;
+	bool		skip_forced_snapshot = false;
 
 	TransactionId xmax = xid;
 
@@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/*
 		 * We could avoid treating !SnapBuildTxnIsRunning transactions as
 		 * timetravel ones, but we want to be able to export a snapshot when
-		 * we reached consistency.
+		 * we reached consistency so we need to keep track of them.
 		 */
 		forced_timetravel = true;
 		elog(DEBUG1, "forced to assume catalog changes for xid %u because it was running too early", xid);
+
+		/*
+		 * It is however desirable to skip building new snapshot for
+		 * !SnapBuildTxnIsRunning transactions as otherwise we might end up
+		 * building thousands of unused snapshots on busy servers which can
+		 * be very expensive.
+		 */
+		if (!SnapBuildTxnIsRunning(builder, xid))
+			skip_forced_snapshot = true;
 	}
 
 	for (nxact = 0; nxact < nsubxacts; nxact++)
@@ -991,21 +1001,10 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		SnapBuildEndTxn(builder, lsn, subxid);
 
 		/*
-		 * If we're forcing timetravel we also need visibility information
-		 * about subtransaction, so keep track of subtransaction's state.
-		 */
-		if (forced_timetravel)
-		{
-			SnapBuildAddCommittedTxn(builder, subxid);
-			if (NormalTransactionIdFollows(subxid, xmax))
-				xmax = subxid;
-		}
-
-		/*
 		 * Add subtransaction to base snapshot if it DDL, we don't distinguish
 		 * to toplevel transactions there.
 		 */
-		else if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+		if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
 		{
 			sub_needs_timetravel = true;
 
@@ -1017,6 +1016,16 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			if (NormalTransactionIdFollows(subxid, xmax))
 				xmax = subxid;
 		}
+		/*
+		 * If we're forcing timetravel we also need visibility information
+		 * about subtransaction, so keep track of subtransaction's state.
+		 */
+		else if (forced_timetravel)
+		{
+			SnapBuildAddCommittedTxn(builder, subxid);
+			if (NormalTransactionIdFollows(subxid, xmax))
+				xmax = subxid;
+		}
 	}
 
 	/*
@@ -1025,14 +1034,8 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 	 */
 	SnapBuildEndTxn(builder, lsn, xid);
 
-	if (forced_timetravel)
-	{
-		elog(DEBUG2, "forced transaction %u to do timetravel.", xid);
-
-		SnapBuildAddCommittedTxn(builder, xid);
-	}
 	/* add toplevel transaction to base snapshot */
-	else if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
+	if (ReorderBufferXidHasCatalogChanges(builder->reorder, xid))
 	{
 		elog(DEBUG2, "found top level transaction %u, with catalog changes!",
 			 xid);
@@ -1045,10 +1048,18 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 		/* mark toplevel txn as timetravel as well */
 		SnapBuildAddCommittedTxn(builder, xid);
 	}
+	else if (forced_timetravel)
+	{
+		elog(DEBUG2, "forced transaction %u to do timetravel.", xid);
+
+		SnapBuildAddCommittedTxn(builder, xid);
+	}
 
 	/* if there's any reason to build a historic snapshot, do so now */
 	if (forced_timetravel || top_needs_timetravel || sub_needs_timetravel)
 	{
+		bool build_snapshot;
+
 		/*
 		 * Adjust xmax of the snapshot builder, we only do that for committed,
 		 * catalog modifying, transactions, everything else isn't interesting
@@ -1069,14 +1080,29 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 			return;
 
 		/*
-		 * Decrease the snapshot builder's refcount of the old snapshot, note
-		 * that it still will be used if it has been handed out to the
-		 * reorderbuffer earlier.
+		 * Build snapshot if needed. We need to build it if there isn't one
+		 * already built, or if the transaction has made catalog changes or
+		 * when we can't know if transaction made catalog changes.
 		 */
-		if (builder->snapshot)
+		build_snapshot = !builder->snapshot || top_needs_timetravel ||
+			sub_needs_timetravel || !skip_forced_snapshot;
+
+		/*
+		 * Decrease the snapshot builder's refcount of the old snapshot if we
+		 * plan to build new one, note that it still will be used if it has
+		 * been handed out to the reorderbuffer earlier.
+		 */
+		if (builder->snapshot && build_snapshot)
 			SnapBuildSnapDecRefcount(builder->snapshot);
 
-		builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+		/* Build new snapshot unless asked not to. */
+		if (build_snapshot)
+		{
+			builder->snapshot = SnapBuildBuildSnapshot(builder, xid);
+
+			/* refcount of the snapshot builder for the new snapshot */
+			SnapBuildSnapIncRefcount(builder->snapshot);
+		}
 
 		/* we might need to execute invalidations, add snapshot */
 		if (!ReorderBufferXidHasBaseSnapshot(builder->reorder, xid))
@@ -1086,11 +1112,9 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, TransactionId xid,
 										 builder->snapshot);
 		}
 
-		/* refcount of the snapshot builder for the new snapshot */
-		SnapBuildSnapIncRefcount(builder->snapshot);
-
 		/* add a new Snapshot to all currently running transactions */
-		SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
+		if (build_snapshot)
+			SnapBuildDistributeNewCatalogSnapshot(builder, lsn);
 	}
 	else
 	{
-- 
2.7.4

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to