On 03.05.2013 18:17, Fujii Masao wrote:
Hi,

I got the following assertion failure when I promoted the standby.

2013-05-04 00:12:31 JST sby1 LOG:  received promote request
2013-05-04 00:12:31 JST sby1 FATAL:  terminating walreceiver process
due to administrator command
2013-05-04 00:12:31 JST sby1 LOG:  redo done at 0/6FFE038
2013-05-04 00:12:31 JST sby1 LOG:  last completed transaction was at
log time 2013-05-04 00:12:25.878909+09
2013-05-04 00:12:31 JST sby1 LOG:  selected new timeline ID: 2
2013-05-04 00:12:31 JST sby1 LOG:  archive recovery complete
2013-05-04 00:12:31 JST sby1 LOG:  checkpoint starting:
TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
"walsender.c", Line: 1465)
2013-05-04 00:12:31 JST sby1 LOG:  autovacuum launcher started

The way to reproduce this is:

1. Create one master A, one standby B, and one cascade standby C.
2. Run pgbench -i -s 10
3. Promote the standby B before pgbench -i finishes

I was able to reproduce this. The assertion checks that if the system is promoted at WAL location X, we must not have already sent WAL at > X to the client. As the code stands, that assumption is wrong; the walsender will merrily stream WAL that hasn't been replayed yet, and the system can be promoted before replaying all the WAL that has been streamed to a cascading standby. The comment in GetStandbyFlushRecPtr(), which is the function that determined how far the WAL may be streamed to a cascading standby, says this:

        /*
         * We can safely send what's already been replayed. Also, if walreceiver
         * is streaming WAL from the same timeline, we can send anything that
         * it has streamed, but hasn't been replayed yet.
         */

There seems to be two bugs here:

1. This used to work in 9.2, because the startup process would always replay all the WAL present in pg_xlog before promoting (the WAL present in pg_xlog was streamed from master). But the refactorings in xlog.c in 9.3devel broke that, so that the startup process can promote earlier.

2. Even after fixing the logic in xlog.c, there is still a corner-case where the startup process can promote before all the WAL that has been received from walreceiver has been received. That happens if the WAL streaming is terminated at a page boundary, rather than at a record boundary. For example, we might have received WAL up to the page boundary at 0/5BFA000, but the last *complete* record that we have received ends at 0/5BF9BD8.

To fix the second issue, I think two things need to happen. First, we need to suppress the check in walsender. Second, we need to teach the WAL replay to back off when that happens. At the moment, the replay in the cascading standby gets stuck, trying to fetch the next page containing rest of the partial WAL record. Instead, it should throw away the partial record it has, and resync at the end of the last replayed record. I think 9.2 has the same bug, BTW. Without support for timeline switches over streaming replication, it was just more difficult to hit.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to