On 2012-11-21 14:57:08 +0900, Michael Paquier wrote:
> On Tue, Nov 20, 2012 at 8:22 PM, Andres Freund <and...@2ndquadrant.com>wrote:
> 
> > Those aren't unexpected. Perhaps I should not make it a warning then...
> >
> > A short explanation:
> >
> > We can only decode tuples we see in the WAL when we already have a
> > timetravel catalog snapshot before that transaction started. To build
> > such a snapshot we need to collect information about committed which
> > changed the catalog. Unfortunately we can't diagnose whether a txn
> > changed the catalog without a snapshot so we just assume all committed
> > ones do - it just costs a bit of memory. Thats the background of the
> > "forced to assume catalog changes for ..." message.
> >
> > The reason for the ABORTs is related but different. We start out in the
> > "SNAPBUILD_START" state when we try to build a snapshot. When we find
> > initial information about running transactions (i.e. xl_running_xacts)
> > we switch to the "SNAPBUILD_FULL_SNAPSHOT" state which means we can
> > decode all changes in transactions that start *after* the current
> > lsn. Earlier transactions might have tuples on a catalog state we can't
> > query.
> > Only when all transactions we observed as running before the
> > FULL_SNAPSHOT state have finished we switch to SNAPBUILD_CONSISTENT.
> > As we want a consistent/reproducible set of transactions to produce
> > output via the logstream we only pass transactions to the output plugin
> > if they commit *after* CONSISTENT (they can start earlier though!). This
> > allows us to produce a pg_dump compatible snapshot in the moment we get
> > consistent that contains exactly the changes we won't stream out.
> >
> > Makes sense?
> >
> > > 3) Assertion failure while running pgbench, I was  just curious to see
> > how
> > > it reacted when logical replication was put under a little bit of load.
> > > TRAP: FailedAssertion("!(((xid) >= ((TransactionId) 3)) &&
> > > ((snapstate->xmin_running) >= ((TransactionId) 3)))", File:
> > "snapbuild.c",
> > > Line: 877)
> > > => pgbench -i postgres; pgbench -T 500 -c 8 postgres
> >
> > Can you reproduce this one? I would be interested in log output. Because
> > I did run pgbench for quite some time and I haven't seen that one after
> > fixing some issues last week.
> >
> 
> > It implies that snapstate->nrrunning has lost touch with reality...
> >
> Yes, I can reproduce in 10-20 seconds in one of my linux boxes. I haven't
> outputted anything in the logs, but here is the backtrace of the core file
> produced.

Ah, I see. Could you try the following diff?

diff --git a/src/backend/replication/logical/snapbuild.c
b/src/backend/replication/logical/snapbuild.c
index df24b33..797a126 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder,
Snapstate *snapstate,
                 */
                snapstate->transactions_after = buf->origptr;
 
+               snapstate->nrrunning = running->xcnt;
                snapstate->xmin_running = InvalidTransactionId;
                snapstate->xmax_running = InvalidTransactionId;


Greetings,

Andres Freund

-- 
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to