On 2012-11-21 14:57:08 +0900, Michael Paquier wrote: > On Tue, Nov 20, 2012 at 8:22 PM, Andres Freund <and...@2ndquadrant.com>wrote: > > > Those aren't unexpected. Perhaps I should not make it a warning then... > > > > A short explanation: > > > > We can only decode tuples we see in the WAL when we already have a > > timetravel catalog snapshot before that transaction started. To build > > such a snapshot we need to collect information about committed which > > changed the catalog. Unfortunately we can't diagnose whether a txn > > changed the catalog without a snapshot so we just assume all committed > > ones do - it just costs a bit of memory. Thats the background of the > > "forced to assume catalog changes for ..." message. > > > > The reason for the ABORTs is related but different. We start out in the > > "SNAPBUILD_START" state when we try to build a snapshot. When we find > > initial information about running transactions (i.e. xl_running_xacts) > > we switch to the "SNAPBUILD_FULL_SNAPSHOT" state which means we can > > decode all changes in transactions that start *after* the current > > lsn. Earlier transactions might have tuples on a catalog state we can't > > query. > > Only when all transactions we observed as running before the > > FULL_SNAPSHOT state have finished we switch to SNAPBUILD_CONSISTENT. > > As we want a consistent/reproducible set of transactions to produce > > output via the logstream we only pass transactions to the output plugin > > if they commit *after* CONSISTENT (they can start earlier though!). This > > allows us to produce a pg_dump compatible snapshot in the moment we get > > consistent that contains exactly the changes we won't stream out. > > > > Makes sense? > > > > > 3) Assertion failure while running pgbench, I was just curious to see > > how > > > it reacted when logical replication was put under a little bit of load. > > > TRAP: FailedAssertion("!(((xid) >= ((TransactionId) 3)) && > > > ((snapstate->xmin_running) >= ((TransactionId) 3)))", File: > > "snapbuild.c", > > > Line: 877) > > > => pgbench -i postgres; pgbench -T 500 -c 8 postgres > > > > Can you reproduce this one? I would be interested in log output. Because > > I did run pgbench for quite some time and I haven't seen that one after > > fixing some issues last week. > > > > > It implies that snapstate->nrrunning has lost touch with reality... > > > Yes, I can reproduce in 10-20 seconds in one of my linux boxes. I haven't > outputted anything in the logs, but here is the backtrace of the core file > produced.
Ah, I see. Could you try the following diff? diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index df24b33..797a126 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -471,6 +471,7 @@ SnapBuildDecodeCallback(ReorderBuffer *reorder, Snapstate *snapstate, */ snapstate->transactions_after = buf->origptr; + snapstate->nrrunning = running->xcnt; snapstate->xmin_running = InvalidTransactionId; snapstate->xmax_running = InvalidTransactionId; Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers