Hi,
On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:
> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <[email protected]>
> Date: Fri, 24 Feb 2017 21:39:03 +0100
> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
>
> Otherwise the VACUUM or pruning might remove tuples still needed by the
> exported snapshot.
> ---
> src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++-
> 1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/src/backend/replication/logical/logical.c
> b/src/backend/replication/logical/logical.c
> index 5529ac8..57c392c 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
> * the slot machinery about the new limit. Once that's done the
> * ProcArrayLock can be released as the slot machinery now is
> * protecting against vacuum.
> + *
> + * Note that we only store the global xmin temporarily in the in-memory
> + * state so that the initial snapshot can be exported. After initial
> + * snapshot is done global xmin should be reset and not tracked anymore
> + * so we are fine with losing the global xmin after crash.
> * ----
> */
> LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>
> slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
> slot->data.catalog_xmin = slot->effective_catalog_xmin;
> + slot->effective_xmin = slot->effective_catalog_xmin;
> void
> FreeDecodingContext(LogicalDecodingContext *ctx)
> {
> + ReplicationSlot *slot = MyReplicationSlot;
> +
> if (ctx->callbacks.shutdown_cb != NULL)
> shutdown_cb_wrapper(ctx);
>
> + /*
> + * Cleanup global xmin for the slot that we may have set in
> + * CreateInitDecodingContext().
Hm. Is that actually a meaningful point to do so? For one, it gets
called by pg_logical_slot_get_changes_guts(), but more importantly, the
snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
next command? If we rely on the snapshot magic done by ExportSnapshot()
it'd be worthwhile to mention that...
> We do not take ProcArrayLock or similar
> + * since we only reset xmin here and there's not much harm done by a
> + * concurrent computation missing that.
> + */
Hum. I was prepared to complain about this, but ISTM, that there's
absolutely no risk because the following
ReplicationSlotsComputeRequiredXmin(false); actually does all the
necessary locking? But still, I don't see much point in the
optimization.
> This patch changes the code so that stored snapshots are only used for
> logical decoding restart but not for initial slot snapshot.
Yea, that's a very good point...
> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
> lsn, xl_running_xacts *runn
>
> return false;
> }
> - /* c) valid on disk state */
> - else if (SnapBuildRestore(builder, lsn))
> + /* c) valid on disk state and not exported snapshot */
> + else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
> + SnapBuildRestore(builder, lsn))
Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
interface, where it'd strictly speaking not be required, right?
> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <[email protected]>
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards
A bit more commentary would be good. What does that protect us against?
> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <[email protected]>
> Date: Wed, 22 Feb 2017 00:57:33 +0100
> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
>
> Due to race condition, the xl_running_xacts might contain no longer
> running transactions. Previous coding tried to get around this by
> additional locking but that did not work correctly for committs. Instead
> try combining decoded commits and multiple xl_running_xacts to get the
> consistent snapshot.
Needs more explanation about approach.
> @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
> lsn, xl_running_xacts *runn
> * simply track the number of in-progress toplevel transactions
> and
> * lower it whenever one commits or aborts. When that number
> * (builder->running.xcnt) reaches zero, we can go from
> FULL_SNAPSHOT
> - * to CONSISTENT.
> + * to CONSISTENT. Sometimes we might get xl_running_xacts which
> has
> + * all tracked transactions as finished. We'll need to restart
> tracking
> + * in that case and use previously collected committed
> transactions to
> + * purge transactions mistakenly marked as running in the
> + * xl_running_xacts which exist as a result of race condition in
> + * LogStandbySnapshot().
I'm not following this yet.
> @@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
> lsn, xl_running_xacts *runn
> * b) first encounter of a useable xl_running_xacts record. If we had
> * found one earlier we would either track running transactions (i.e.
> * builder->running.xcnt != 0) or be consistent (this function wouldn't
> - * get called).
> + * get called). However it's possible that we could not see all
> + * transactions that were marked as running in xl_running_xacts, so if
> + * we get new one that says all were closed but we are not consistent
> + * yet, we need to restart the tracking while taking previously seen
> + * transactions into account.
This needs to revise the preceding comment more heavily. "This is the
first!!! Or maybe not!" isn't easy to understand.
> */
> - else if (!builder->running.xcnt)
> + else if (!builder->running.xcnt ||
> + running->oldestRunningXid > builder->running.xmax)
Isn't that wrong under wraparound?
> {
> int off;
> + bool first = builder->running.xcnt == 0;
>
> /*
> * We only care about toplevel xids as those are the ones we
> @@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr
> lsn, xl_running_xacts *runn
> builder->running.xmin = builder->running.xip[0];
> builder->running.xmax = builder->running.xip[running->xcnt - 1];
>
> +
> /* makes comparisons cheaper later */
> TransactionIdRetreat(builder->running.xmin);
> TransactionIdAdvance(builder->running.xmax);
>
> builder->state = SNAPBUILD_FULL_SNAPSHOT;
>
> - ereport(LOG,
> - (errmsg("logical decoding found initial starting point
> at %X/%X",
> - (uint32) (lsn >> 32), (uint32) lsn),
> - errdetail_plural("%u transaction needs to finish.",
> - "%u transactions need
> to finish.",
> - builder->running.xcnt,
> - (uint32)
> builder->running.xcnt)));
> -
> + /*
> + * If this is the first time we've seen xl_running_xacts, we
> are done.
> + */
> + if (first)
> + {
> + ereport(LOG,
> + (errmsg("logical decoding found initial
> starting point at %X/%X",
> + (uint32) (lsn >> 32), (uint32)
> lsn),
> + errdetail_plural("%u transaction needs to
> finish.",
> + "%u
> transactions need to finish.",
> +
> builder->running.xcnt,
> + (uint32)
> builder->running.xcnt)));
> + }
> + else
> + {
> + /*
> + * Because of the race condition in
> LogStandbySnapshot() the
> + * transactions recorded in xl_running_xacts as running
> might have
> + * already committed by the time the xl_running_xacts
> was written
> + * to WAL. Use the information about decoded
> transactions that we
> + * gathered so far to update our idea about what's
> still running.
> + *
> + * We can use SnapBuildEndTxn directly as it only does
> the
> + * transaction running check and handling without any
> additional
> + * side effects.
> + */
> + for (off = 0; off < builder->committed.xcnt; off++)
> + SnapBuildEndTxn(builder, lsn,
> builder->committed.xip[off]);
> + if (builder->state == SNAPBUILD_CONSISTENT)
> + return false;
> +
> + ereport(LOG,
> + (errmsg("logical decoding moved initial
> starting point to %X/%X",
> + (uint32) (lsn >> 32), (uint32)
> lsn),
> + errdetail_plural("%u transaction needs to
> finish.",
> + "%u
> transactions need to finish.",
> +
> builder->running.xcnt,
> + (uint32)
> builder->running.xcnt)));
> + }
Hm, this is not pretty.
> From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <[email protected]>
> Date: Tue, 21 Feb 2017 19:58:18 +0100
> Subject: [PATCH 5/5] Skip unnecessary snapshot builds
>
> When doing initial snapshot build during logical decoding
> initialization, don't build snapshots for transactions where we know the
> transaction didn't do any catalog changes. Otherwise we might end up
> with thousands of useless snapshots on busy server which can be quite
> expensive.
> ---
> src/backend/replication/logical/snapbuild.c | 82
> +++++++++++++++++++----------
> 1 file changed, 53 insertions(+), 29 deletions(-)
>
> diff --git a/src/backend/replication/logical/snapbuild.c
> b/src/backend/replication/logical/snapbuild.c
> index 1a1c9ba..c800aa5 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
> TransactionId xid,
> bool forced_timetravel = false;
> bool sub_needs_timetravel = false;
> bool top_needs_timetravel = false;
> + bool skip_forced_snapshot = false;
>
> TransactionId xmax = xid;
>
> @@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn,
> TransactionId xid,
> /*
> * We could avoid treating !SnapBuildTxnIsRunning transactions
> as
> * timetravel ones, but we want to be able to export a snapshot
> when
> - * we reached consistency.
> + * we reached consistency so we need to keep track of them.
> */
> forced_timetravel = true;
> elog(DEBUG1, "forced to assume catalog changes for xid %u
> because it was running too early", xid);
> +
> + /*
> + * It is however desirable to skip building new snapshot for
> + * !SnapBuildTxnIsRunning transactions as otherwise we might
> end up
> + * building thousands of unused snapshots on busy servers which
> can
> + * be very expensive.
> + */
> + if (!SnapBuildTxnIsRunning(builder, xid))
> + skip_forced_snapshot = true;
> }
That's pretty crudely bolted on the existing logic, isn't there a
simpler way?
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers