Hi, On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote: > From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001 > From: Petr Jelinek <pjmo...@pjmodos.net> > Date: Fri, 24 Feb 2017 21:39:03 +0100 > Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export > > Otherwise the VACUUM or pruning might remove tuples still needed by the > exported snapshot. > --- > src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++- > 1 file changed, 20 insertions(+), 1 deletion(-) > > diff --git a/src/backend/replication/logical/logical.c > b/src/backend/replication/logical/logical.c > index 5529ac8..57c392c 100644 > --- a/src/backend/replication/logical/logical.c > +++ b/src/backend/replication/logical/logical.c > @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin, > * the slot machinery about the new limit. Once that's done the > * ProcArrayLock can be released as the slot machinery now is > * protecting against vacuum. > + * > + * Note that we only store the global xmin temporarily in the in-memory > + * state so that the initial snapshot can be exported. After initial > + * snapshot is done global xmin should be reset and not tracked anymore > + * so we are fine with losing the global xmin after crash. > * ---- > */ > LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); > > slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId(); > slot->data.catalog_xmin = slot->effective_catalog_xmin; > + slot->effective_xmin = slot->effective_catalog_xmin;
> void > FreeDecodingContext(LogicalDecodingContext *ctx) > { > + ReplicationSlot *slot = MyReplicationSlot; > + > if (ctx->callbacks.shutdown_cb != NULL) > shutdown_cb_wrapper(ctx); > > + /* > + * Cleanup global xmin for the slot that we may have set in > + * CreateInitDecodingContext(). Hm. Is that actually a meaningful point to do so? For one, it gets called by pg_logical_slot_get_changes_guts(), but more importantly, the snapshot is exported till SnapBuildClearExportedSnapshot(), which is the next command? If we rely on the snapshot magic done by ExportSnapshot() it'd be worthwhile to mention that... > We do not take ProcArrayLock or similar > + * since we only reset xmin here and there's not much harm done by a > + * concurrent computation missing that. > + */ Hum. I was prepared to complain about this, but ISTM, that there's absolutely no risk because the following ReplicationSlotsComputeRequiredXmin(false); actually does all the necessary locking? But still, I don't see much point in the optimization. > This patch changes the code so that stored snapshots are only used for > logical decoding restart but not for initial slot snapshot. Yea, that's a very good point... > @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr > lsn, xl_running_xacts *runn > > return false; > } > - /* c) valid on disk state */ > - else if (SnapBuildRestore(builder, lsn)) > + /* c) valid on disk state and not exported snapshot */ > + else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) && > + SnapBuildRestore(builder, lsn)) Hm. Is this a good signaling mechanism? It'll also trigger for the SQL interface, where it'd strictly speaking not be required, right? > From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001 > From: Petr Jelinek <pjmo...@pjmodos.net> > Date: Sun, 26 Feb 2017 01:07:33 +0100 > Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards A bit more commentary would be good. What does that protect us against? > From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001 > From: Petr Jelinek <pjmo...@pjmodos.net> > Date: Wed, 22 Feb 2017 00:57:33 +0100 > Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder > > Due to race condition, the xl_running_xacts might contain no longer > running transactions. Previous coding tried to get around this by > additional locking but that did not work correctly for committs. Instead > try combining decoded commits and multiple xl_running_xacts to get the > consistent snapshot. Needs more explanation about approach. > @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr > lsn, xl_running_xacts *runn > * simply track the number of in-progress toplevel transactions > and > * lower it whenever one commits or aborts. When that number > * (builder->running.xcnt) reaches zero, we can go from > FULL_SNAPSHOT > - * to CONSISTENT. > + * to CONSISTENT. Sometimes we might get xl_running_xacts which > has > + * all tracked transactions as finished. We'll need to restart > tracking > + * in that case and use previously collected committed > transactions to > + * purge transactions mistakenly marked as running in the > + * xl_running_xacts which exist as a result of race condition in > + * LogStandbySnapshot(). I'm not following this yet. > @@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr > lsn, xl_running_xacts *runn > * b) first encounter of a useable xl_running_xacts record. If we had > * found one earlier we would either track running transactions (i.e. > * builder->running.xcnt != 0) or be consistent (this function wouldn't > - * get called). > + * get called). However it's possible that we could not see all > + * transactions that were marked as running in xl_running_xacts, so if > + * we get new one that says all were closed but we are not consistent > + * yet, we need to restart the tracking while taking previously seen > + * transactions into account. This needs to revise the preceding comment more heavily. "This is the first!!! Or maybe not!" isn't easy to understand. > */ > - else if (!builder->running.xcnt) > + else if (!builder->running.xcnt || > + running->oldestRunningXid > builder->running.xmax) Isn't that wrong under wraparound? > { > int off; > + bool first = builder->running.xcnt == 0; > > /* > * We only care about toplevel xids as those are the ones we > @@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr > lsn, xl_running_xacts *runn > builder->running.xmin = builder->running.xip[0]; > builder->running.xmax = builder->running.xip[running->xcnt - 1]; > > + > /* makes comparisons cheaper later */ > TransactionIdRetreat(builder->running.xmin); > TransactionIdAdvance(builder->running.xmax); > > builder->state = SNAPBUILD_FULL_SNAPSHOT; > > - ereport(LOG, > - (errmsg("logical decoding found initial starting point > at %X/%X", > - (uint32) (lsn >> 32), (uint32) lsn), > - errdetail_plural("%u transaction needs to finish.", > - "%u transactions need > to finish.", > - builder->running.xcnt, > - (uint32) > builder->running.xcnt))); > - > + /* > + * If this is the first time we've seen xl_running_xacts, we > are done. > + */ > + if (first) > + { > + ereport(LOG, > + (errmsg("logical decoding found initial > starting point at %X/%X", > + (uint32) (lsn >> 32), (uint32) > lsn), > + errdetail_plural("%u transaction needs to > finish.", > + "%u > transactions need to finish.", > + > builder->running.xcnt, > + (uint32) > builder->running.xcnt))); > + } > + else > + { > + /* > + * Because of the race condition in > LogStandbySnapshot() the > + * transactions recorded in xl_running_xacts as running > might have > + * already committed by the time the xl_running_xacts > was written > + * to WAL. Use the information about decoded > transactions that we > + * gathered so far to update our idea about what's > still running. > + * > + * We can use SnapBuildEndTxn directly as it only does > the > + * transaction running check and handling without any > additional > + * side effects. > + */ > + for (off = 0; off < builder->committed.xcnt; off++) > + SnapBuildEndTxn(builder, lsn, > builder->committed.xip[off]); > + if (builder->state == SNAPBUILD_CONSISTENT) > + return false; > + > + ereport(LOG, > + (errmsg("logical decoding moved initial > starting point to %X/%X", > + (uint32) (lsn >> 32), (uint32) > lsn), > + errdetail_plural("%u transaction needs to > finish.", > + "%u > transactions need to finish.", > + > builder->running.xcnt, > + (uint32) > builder->running.xcnt))); > + } Hm, this is not pretty. > From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001 > From: Petr Jelinek <pjmo...@pjmodos.net> > Date: Tue, 21 Feb 2017 19:58:18 +0100 > Subject: [PATCH 5/5] Skip unnecessary snapshot builds > > When doing initial snapshot build during logical decoding > initialization, don't build snapshots for transactions where we know the > transaction didn't do any catalog changes. Otherwise we might end up > with thousands of useless snapshots on busy server which can be quite > expensive. > --- > src/backend/replication/logical/snapbuild.c | 82 > +++++++++++++++++++---------- > 1 file changed, 53 insertions(+), 29 deletions(-) > > diff --git a/src/backend/replication/logical/snapbuild.c > b/src/backend/replication/logical/snapbuild.c > index 1a1c9ba..c800aa5 100644 > --- a/src/backend/replication/logical/snapbuild.c > +++ b/src/backend/replication/logical/snapbuild.c > @@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, > TransactionId xid, > bool forced_timetravel = false; > bool sub_needs_timetravel = false; > bool top_needs_timetravel = false; > + bool skip_forced_snapshot = false; > > TransactionId xmax = xid; > > @@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, > TransactionId xid, > /* > * We could avoid treating !SnapBuildTxnIsRunning transactions > as > * timetravel ones, but we want to be able to export a snapshot > when > - * we reached consistency. > + * we reached consistency so we need to keep track of them. > */ > forced_timetravel = true; > elog(DEBUG1, "forced to assume catalog changes for xid %u > because it was running too early", xid); > + > + /* > + * It is however desirable to skip building new snapshot for > + * !SnapBuildTxnIsRunning transactions as otherwise we might > end up > + * building thousands of unused snapshots on busy servers which > can > + * be very expensive. > + */ > + if (!SnapBuildTxnIsRunning(builder, xid)) > + skip_forced_snapshot = true; > } That's pretty crudely bolted on the existing logic, isn't there a simpler way? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers