Hello, I've also found this does not fix this problem. > >> So I'd say we should update minRecoveryPoint first, then > >> truncate/delete. But we should still keep the XLogFlush() at the end of > >> xact_redo_commit_internal(), for the case where files/directories are > >> created. Patch attached. > > Sounds reasonable.
It makes perfectly sense. > > Committed and backpatched that. Attached is a script I used to reproduce > > this problem, going back to 8.4. > > Thanks! > > Unfortunately I could reproduce the problem even after that commit. > Attached is the script I used to reproduce the problem. Me too. > The cause is that CheckRecoveryConsistency() is called before rm_redo(), > as Horiguchi-san suggested upthead. Imagine the case where > minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE > record. When restarting the server with that minRecoveryPoint, > the followings would happen, and then PANIC occurs. > > 1. XLOG_SMGR_TRUNCATE record is read. > 2. CheckRecoveryConsistency() is called, and database is marked as > consistent since we've reached minRecoveryPoint (i.e., the location > of XLOG_SMGR_TRUNCATE). > 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is > found. > 4. Since the database has already been marked as consistent, an invalid > page leads to PANIC. Exactly. In smgr_redo, EndRecPtr which is pointing the record next to SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other hand, during the second startup of the standby, CheckRecoveryConsistency checks for consistency by XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at just BEFORE the SMGR_TRUNCATE record is applied. So reachedConsistency becomes true just before the SMGR_TRUNCATE record will be applied. Bang! I said I had no objection to placing CheckRecoveryConsistency both before and after of rm_redo in previous message, but it was wrong. Given aminRecoveryPoint value, it should be placed after rm_redo from the point of view of when the database should be considered to be consistent. Actually, simply moving CheckRecoeverConsistency after rm_redo turned into succeessfully startup, ignoring the another reason for it should be before, which is unknown to me. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers