On 02.11.2010 07:15, Fujii Masao wrote:
On Mon, Nov 1, 2010 at 8:32 PM, Heikki Linnakangas
<heikki.linnakan...@enterprisedb.com>  wrote:
Yeah, that's one approach. Another is to validate the TLI in the xlog page
header, it should always match the current timeline we're on. That would
feel more robust to me.

Yeah, that seems better.

I finally got around to look at this. I wrote a patch to validate that the TLI on xlog page header matches ThisTimeLineID during recovery, and noticed quickly in testing that it doesn't catch all the cases I'd like to catch :-(.

The problem scenario is this:


TLI 1 -----------+C-------+------->Standby
                 .
                 .
TLI 2            +C-------+------->


The two horizontal lines represent two timelines. TLI 2 forks off from TLI 1, because of a failover to a not-completely up-to-date standby server, for example. The plus-signs represent WAL segment boundaries and C's represent checkpoint records.

Another standby server has replayed all the WAL on TLI 2. Its latest restartpoint is C. The checkpoint records on the different timelines are at the same location, at the beginning of the WAL files - not all that impossible if you have archive_timeout set, for example.

Now, if you stop and restart the standby, it will try to recover to the latest timeline, which is TLI 2. But before the restart, it had already replayed the WAL from TLI 1, so it's wrong to replay the WAL from the parallel universe of TLI 2. At the moment, it will go ahead and do it, and you end up with an inconsistent database.

I planned to fix that by checking the TLI on the xlog page header, but that alone isn't enough in the above scenario. The TLI on the page headers on timeline 2 are what's expected; the first page on the segment has TLI==1, because it was just forked off from timeline 1, and the subsequent pages have TLI==2, as they should after the checkpoint record.

So we have to remember that before the restart, which timeline where we on. We already remember how far we had replayed, that's the minRecoveryPoint we store in the control file, but we have to memorize the timeline along that.

On reflection, your idea of checking the history file before replaying anything seems much easier. We'll still need to add the timeline alongside minRecoveryPoint to do the checking, but it's a lot easier to do against the history file. And we can validate the TLIs on page headers against the information from the history file as we read in the WAL.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to