Hi,
On 2017-03-03 01:30:11 +0100, Petr Jelinek wrote:

> From 7d5b48c8cb80e7c867b2096c999d08feda50b197 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmo...@pjmodos.net>
> Date: Fri, 24 Feb 2017 21:39:03 +0100
> Subject: [PATCH 1/5] Reserve global xmin for create slot snasphot export
> 
> Otherwise the VACUUM or pruning might remove tuples still needed by the
> exported snapshot.
> ---
>  src/backend/replication/logical/logical.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/src/backend/replication/logical/logical.c 
> b/src/backend/replication/logical/logical.c
> index 5529ac8..57c392c 100644
> --- a/src/backend/replication/logical/logical.c
> +++ b/src/backend/replication/logical/logical.c
> @@ -267,12 +267,18 @@ CreateInitDecodingContext(char *plugin,
>        * the slot machinery about the new limit. Once that's done the
>        * ProcArrayLock can be released as the slot machinery now is
>        * protecting against vacuum.
> +      *
> +      * Note that we only store the global xmin temporarily in the in-memory
> +      * state so that the initial snapshot can be exported. After initial
> +      * snapshot is done global xmin should be reset and not tracked anymore
> +      * so we are fine with losing the global xmin after crash.
>        * ----
>        */
>       LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>  
>       slot->effective_catalog_xmin = GetOldestSafeDecodingTransactionId();
>       slot->data.catalog_xmin = slot->effective_catalog_xmin;
> +     slot->effective_xmin = slot->effective_catalog_xmin;


>  void
>  FreeDecodingContext(LogicalDecodingContext *ctx)
>  {
> +     ReplicationSlot *slot = MyReplicationSlot;
> +
>       if (ctx->callbacks.shutdown_cb != NULL)
>               shutdown_cb_wrapper(ctx);
>  
> +     /*
> +      * Cleanup global xmin for the slot that we may have set in
> +      * CreateInitDecodingContext().

Hm.  Is that actually a meaningful point to do so? For one, it gets
called by pg_logical_slot_get_changes_guts(), but more importantly, the
snapshot is exported till SnapBuildClearExportedSnapshot(), which is the
next command?  If we rely on the snapshot magic done by ExportSnapshot()
it'd be worthwhile to mention that...


> We do not take ProcArrayLock or similar
> +      * since we only reset xmin here and there's not much harm done by a
> +      * concurrent computation missing that.
> +      */

Hum.  I was prepared to complain about this, but ISTM, that there's
absolutely no risk because the following
ReplicationSlotsComputeRequiredXmin(false); actually does all the
necessary locking?  But still, I don't see much point in the
optimization.



> This patch changes the code so that stored snapshots are only used for
> logical decoding restart but not for initial slot snapshot.

Yea, that's a very good point...

> @@ -1284,13 +1286,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>  
>               return false;
>       }
> -     /* c) valid on disk state */
> -     else if (SnapBuildRestore(builder, lsn))
> +     /* c) valid on disk state and not exported snapshot */
> +     else if (!TransactionIdIsNormal(builder->initial_xmin_horizon) &&
> +                      SnapBuildRestore(builder, lsn))

Hm. Is this a good signaling mechanism? It'll also trigger for the SQL
interface, where it'd strictly speaking not be required, right?


> From 3318a929e691870f3c1ca665bec3bfa8ea2af2a8 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmo...@pjmodos.net>
> Date: Sun, 26 Feb 2017 01:07:33 +0100
> Subject: [PATCH 3/5] Prevent snapshot builder xmin from going backwards

A bit more commentary would be good. What does that protect us against?



> From 53193b40f26dd19c712f3b9b77af55f81eb31cc4 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmo...@pjmodos.net>
> Date: Wed, 22 Feb 2017 00:57:33 +0100
> Subject: [PATCH 4/5] Fix xl_running_xacts usage in snapshot builder
> 
> Due to race condition, the xl_running_xacts might contain no longer
> running transactions. Previous coding tried to get around this by
> additional locking but that did not work correctly for committs. Instead
> try combining decoded commits and multiple xl_running_xacts to get the
> consistent snapshot.

Needs more explanation about approach.


> @@ -1221,7 +1221,12 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>        *        simply track the number of in-progress toplevel transactions 
> and
>        *        lower it whenever one commits or aborts. When that number
>        *        (builder->running.xcnt) reaches zero, we can go from 
> FULL_SNAPSHOT
> -      *        to CONSISTENT.
> +      *        to CONSISTENT. Sometimes we might get xl_running_xacts which 
> has
> +      *        all tracked transactions as finished. We'll need to restart 
> tracking
> +      *        in that case and use previously collected committed 
> transactions to
> +      *        purge transactions mistakenly marked as running in the
> +      *        xl_running_xacts which exist as a result of race condition in
> +      *        LogStandbySnapshot().

I'm not following this yet.


> @@ -1298,11 +1303,17 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>        * b) first encounter of a useable xl_running_xacts record. If we had
>        * found one earlier we would either track running transactions (i.e.
>        * builder->running.xcnt != 0) or be consistent (this function wouldn't
> -      * get called).
> +      * get called). However it's possible that we could not see all
> +      * transactions that were marked as running in xl_running_xacts, so if
> +      * we get new one that says all were closed but we are not consistent
> +      * yet, we need to restart the tracking while taking previously seen
> +      * transactions into account.

This needs to revise the preceding comment more heavily.  "This is the
first!!! Or maybe not!" isn't easy to understand.


>        */
> -     else if (!builder->running.xcnt)
> +     else if (!builder->running.xcnt ||
> +                      running->oldestRunningXid > builder->running.xmax)

Isn't that wrong under wraparound?


>       {
>               int                     off;
> +             bool            first = builder->running.xcnt == 0;
>  
>               /*
>                * We only care about toplevel xids as those are the ones we
> @@ -1338,20 +1349,13 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr 
> lsn, xl_running_xacts *runn
>               builder->running.xmin = builder->running.xip[0];
>               builder->running.xmax = builder->running.xip[running->xcnt - 1];
>  
> +
>               /* makes comparisons cheaper later */
>               TransactionIdRetreat(builder->running.xmin);
>               TransactionIdAdvance(builder->running.xmax);
>  
>               builder->state = SNAPBUILD_FULL_SNAPSHOT;
>  
> -             ereport(LOG,
> -                     (errmsg("logical decoding found initial starting point 
> at %X/%X",
> -                                     (uint32) (lsn >> 32), (uint32) lsn),
> -                      errdetail_plural("%u transaction needs to finish.",
> -                                                       "%u transactions need 
> to finish.",
> -                                                       builder->running.xcnt,
> -                                                       (uint32) 
> builder->running.xcnt)));
> -
> +             /*
> +              * If this is the first time we've seen xl_running_xacts, we 
> are done.
> +              */
> +             if (first)
> +             {
> +                     ereport(LOG,
> +                             (errmsg("logical decoding found initial 
> starting point at %X/%X",
> +                                             (uint32) (lsn >> 32), (uint32) 
> lsn),
> +                              errdetail_plural("%u transaction needs to 
> finish.",
> +                                                               "%u 
> transactions need to finish.",
> +                                                               
> builder->running.xcnt,
> +                                                               (uint32) 
> builder->running.xcnt)));
> +             }
> +             else
> +             {
> +                     /*
> +                      * Because of the race condition in 
> LogStandbySnapshot() the
> +                      * transactions recorded in xl_running_xacts as running 
> might have
> +                      * already committed by the time the xl_running_xacts 
> was written
> +                      * to WAL. Use the information about decoded 
> transactions that we
> +                      * gathered so far to update our idea about what's 
> still running.
> +                      *
> +                      * We can use SnapBuildEndTxn directly as it only does 
> the
> +                      * transaction running check and handling without any 
> additional
> +                      * side effects.
> +                      */
> +                     for (off = 0; off < builder->committed.xcnt; off++)
> +                             SnapBuildEndTxn(builder, lsn, 
> builder->committed.xip[off]);
> +                     if (builder->state == SNAPBUILD_CONSISTENT)
> +                             return false;
> +
> +                     ereport(LOG,
> +                             (errmsg("logical decoding moved initial 
> starting point to %X/%X",
> +                                             (uint32) (lsn >> 32), (uint32) 
> lsn),
> +                              errdetail_plural("%u transaction needs to 
> finish.",
> +                                                               "%u 
> transactions need to finish.",
> +                                                               
> builder->running.xcnt,
> +                                                               (uint32) 
> builder->running.xcnt)));
> +             }

Hm, this is not pretty.




> From 4217da872e9aa48750c020542d8bc22c863a3d75 Mon Sep 17 00:00:00 2001
> From: Petr Jelinek <pjmo...@pjmodos.net>
> Date: Tue, 21 Feb 2017 19:58:18 +0100
> Subject: [PATCH 5/5] Skip unnecessary snapshot builds
> 
> When doing initial snapshot build during logical decoding
> initialization, don't build snapshots for transactions where we know the
> transaction didn't do any catalog changes. Otherwise we might end up
> with thousands of useless snapshots on busy server which can be quite
> expensive.
> ---
>  src/backend/replication/logical/snapbuild.c | 82 
> +++++++++++++++++++----------
>  1 file changed, 53 insertions(+), 29 deletions(-)
> 
> diff --git a/src/backend/replication/logical/snapbuild.c 
> b/src/backend/replication/logical/snapbuild.c
> index 1a1c9ba..c800aa5 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -954,6 +954,7 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, 
> TransactionId xid,
>       bool            forced_timetravel = false;
>       bool            sub_needs_timetravel = false;
>       bool            top_needs_timetravel = false;
> +     bool            skip_forced_snapshot = false;
>  
>       TransactionId xmax = xid;
>  
> @@ -975,10 +976,19 @@ SnapBuildCommitTxn(SnapBuild *builder, XLogRecPtr lsn, 
> TransactionId xid,
>               /*
>                * We could avoid treating !SnapBuildTxnIsRunning transactions 
> as
>                * timetravel ones, but we want to be able to export a snapshot 
> when
> -              * we reached consistency.
> +              * we reached consistency so we need to keep track of them.
>                */
>               forced_timetravel = true;
>               elog(DEBUG1, "forced to assume catalog changes for xid %u 
> because it was running too early", xid);
> +
> +             /*
> +              * It is however desirable to skip building new snapshot for
> +              * !SnapBuildTxnIsRunning transactions as otherwise we might 
> end up
> +              * building thousands of unused snapshots on busy servers which 
> can
> +              * be very expensive.
> +              */
> +             if (!SnapBuildTxnIsRunning(builder, xid))
> +                     skip_forced_snapshot = true;
>       }

That's pretty crudely bolted on the existing logic, isn't there a
simpler way?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to