Simon Riggs wrote:
* I think we are now renaming the recovery.conf file too early. The
comment says "We have already restored all the WAL segments we need from
the archive, and we trust that they are not going to go away even if we
crash." We have, but the files overwrite each other as they arrive, so
if the last restartpoint is not in the last restored WAL file then it
will only exist in the archive. The recovery.conf is the only place
where we store the information on where the archive is and how to access
it, so by renaming it out of the way we will be unable to crash recover
until the first checkpoint is complete. So the way this was in the
original patch is the correct way to go, AFAICS.

I can see what you mean now. In fact we're not safe even when the last restartpoint is in the last restored WAL file, because we always restore segments from the archive to a temporary filename.

Yes, renaming recovery.conf at the first checkpoint avoids that problem. However, it means that we'll re-enter archive recovery if we crash before that checkpoint is finished. Won't that cause havoc if more files have appeared to the archive since the crash, and we restore those even though we already started up from an earlier point? How do the timelines work in that case?

We could avoid that by performing a good old startup checkpoint, but I quite like the fast failover time we get without it.

* my original intention was to deprecate log_restartpoints and would
still like to do so. log_checkpoints does just as well for that. Even
less code than before...

Ok, done.

* comment on BgWriterShmemInit() refers to CHECKPOINT_IS_STARTUP, but
the actual define is CHECKPOINT_STARTUP. Would prefer the "is" version
because it matches the IS_SHUTDOWN naming.

Fixed.

* In CreateCheckpoint() the if test on TruncateSubtrans() has been
removed, but the comment has not been changed (to explain why).

Thanks, comment updated. (we now call CreateCheckPoint() only after recovery is finished)

We should continue to measure performance of recovery in the light of
these changes. I still feel that fsyncing the control file on each
XLogFileRead() will give a noticeable performance penalty, mostly
because we know doing exactly the same thing in normal running caused a
performance penalty. But that is easily changed and cannot be done with
any certainty without wider feedback, so no reason to delay code commit.

I've changed the way minRecoveryPoint is updated now anyway, so it no longer happens every XLogFileRead().

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to