On 03/05/2014 10:51 AM, Kyotaro HORIGUCHI wrote:
Hello,

After all, I have confirmed that this fixes the problem on crash
recovery of hot-standby botfor 9.3 and HEAD and no problem was
found except unreadability :(

Ok, committed. Thanks!

Any concrete suggestions about the readability? Is there some particular spot that needs clarifying?

By the way, I moderately want to fix an assertion message to a
ordinary one. Details are below.

====
The server stops with following message during restarting after
crash requesting archive recovery when the WAL has been produced
with the wal_level below WAL_LEVEL_HOT_STANDBY.

| TRAP: FailedAssertion("!(((oldestActiveXID) != ((TransactionId) 0)))", File: 
"xlog.c", Line: 6799)
| LOG:  startup process (PID 7270) was terminated by signal 6: Aborted

Surely this is the consequence of illegal operation but I think
it is also not a issue of assertion - which fires on something
wrong in design or quite rare cases(this case ?).

Ah, I see. Yes, that's definitely a bug. If you don't hit the assertion, because the oldestActiveXID is set in the checkpoint record even though wal_level is 'archive', or if you simply have assertions disabled, the system will start up in hot standby mode even though it's not safe.

So it might be
better to show message as below on the case.

| FATAL:  Checkpoint doesn't have valid oldest active transaction id
| HINT:  Reading WAL might have been written under insufficient wal_level.

This could do in this way,
======
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index e3d5e10..bb6922a 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6789,7 +6789,13 @@ StartupXLOG(void)
                        if (wasShutdown)
                                oldestActiveXID = 
PrescanPreparedTransactions(&xids, &nxids);
                        else
+                       {
                                oldestActiveXID = checkPoint.oldestActiveXid;
+                               if (!TransactionIdIsValid(oldestActiveXID))
+                                       ereport(FATAL,
+                                                       (errmsg("Checkpoint doesn't 
have valid oldest active transaction id"),
+                                                        errhint("Reading WAL might 
have been written under insufficient wal_level.")));
+                       }
                        Assert(TransactionIdIsValid(oldestActiveXID));

                        /* Tell procarray about the range of xids it has to 
deal with */
=====


What do you think about this? Feel free dumping this if you feel
negative on this.

Hmm. When I test that with 9.2, oldestActiveXID is not 0, even though wal_level is 'archive'. So the above patch doesn't fix the whole problem.

The real bug here is that CheckRequiredParameterValues() tests for InArchiveRecovery, when it should be testing for ArchiveRecoveryRequested. Otherwise, the checks are not performed when going through the crash recovery followed by archive recovery. I should've changed that as part of the commit that added the crash recovery then archive recovery behavior.

Fixed, thanks for pointing it out!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to