Re: [HACKERS] Hot standby, recovery infra

Heikki Linnakangas Wed, 04 Feb 2009 09:11:29 -0800

Simon Riggs wrote:

* I think we are now renaming the recovery.conf file too early. The
comment says "We have already restored all the WAL segments we need from
the archive, and we trust that they are not going to go away even if we
crash." We have, but the files overwrite each other as they arrive, so
if the last restartpoint is not in the last restored WAL file then it
will only exist in the archive. The recovery.conf is the only place
where we store the information on where the archive is and how to access
it, so by renaming it out of the way we will be unable to crash recover
until the first checkpoint is complete. So the way this was in the
original patch is the correct way to go, AFAICS.

I can see what you mean now. In fact we're not safe even when the lastrestartpoint is in the last restored WAL file, because we always restoresegments from the archive to a temporary filename.

Yes, renaming recovery.conf at the first checkpoint avoids that problem.However, it means that we'll re-enter archive recovery if we crashbefore that checkpoint is finished. Won't that cause havoc if more fileshave appeared to the archive since the crash, and we restore those eventhough we already started up from an earlier point? How do the timelineswork in that case?

We could avoid that by performing a good old startup checkpoint, but Iquite like the fast failover time we get without it.

* my original intention was to deprecate log_restartpoints and would
still like to do so. log_checkpoints does just as well for that. Even
less code than before...


Ok, done.

* comment on BgWriterShmemInit() refers to CHECKPOINT_IS_STARTUP, but
the actual define is CHECKPOINT_STARTUP. Would prefer the "is" version
because it matches the IS_SHUTDOWN naming.


Fixed.

* In CreateCheckpoint() the if test on TruncateSubtrans() has been
removed, but the comment has not been changed (to explain why).

Thanks, comment updated. (we now call CreateCheckPoint() only afterrecovery is finished)

We should continue to measure performance of recovery in the light of
these changes. I still feel that fsyncing the control file on each
XLogFileRead() will give a noticeable performance penalty, mostly
because we know doing exactly the same thing in normal running caused a
performance penalty. But that is easily changed and cannot be done with
any certainty without wider feedback, so no reason to delay code commit.

I've changed the way minRecoveryPoint is updated now anyway, so it nolonger happens every XLogFileRead().


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, recovery infra

Reply via email to