On 26.04.2013 07:47, Amit Langote wrote:
  How would code after applying this patch behave if a recycled segment gets
renamed using the newest timeline (say 3) while we are still recovering from
a lower timeline (say 2)? In that case, since XLogFileReadAnyTLI returns
that recycled segment as the next segment to recover from, we get the error.
And since XLogFileReadAnyTLI iterates over expectedTLIs (whose head seems to
be recoveryTargetTLI at all times, is that right?), it will return that
wrong (recycled segment) in the first iteration itself.

As long as the right segment is present in the archive, that's OK. Even if a recycled segment with higher TLI is in pg_xlog, with the patch we'll still read the segment with lower TLI from the archive. But there is a corner-case where a recycled segment with a higher TLI masks a segment with lower TLI in pg_xlog. For example, if you try to recover by copying all the required WAL files directly into pg_xlog, without using restore_command, you could run into problems.

So yeah, I think you're right and we need to rethink the recycling. The first question is, do we have to recycle WAL segments during recovery at all? It's pointless when we're restoring from archive with restore_command; the recycled files will just get replaced with files from the archive. It does help when walreceiver is active, but I wonder how significant it is in practice.

I guess the safest, smallest change is to use a lower TLI when installing the recycled files. So, instead of using the current recovery target timeline, use the ID of the timeline we're currently recovering. That way the reycycled segments will never have a higher TLI than other files that recovery will try to replay. See attached patch.

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 30d877b..cfd8a34 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8459,10 +8459,15 @@ CreateRestartPoint(int flags)
 		PrevLogSeg(_logId, _logSeg);
 
 		/*
-		 * Update ThisTimeLineID to the recovery target timeline, so that
-		 * we install any recycled segments on the correct timeline.
+		 * Update ThisTimeLineID to the timeline we're currently recovering,
+		 * so that we install any recycled segments on the correct timeline.
+		 * (This might be higher than the TLI of the restartpoint we just
+		 * made, if a timeline switch was replayed while we were performing
+		 * the restartpoint.)
 		 */
-		ThisTimeLineID = GetRecoveryTargetTLI();
+		SpinLockAcquire(&xlogctl->info_lck);
+		ThisTimeLineID = XLogCtl->lastCheckPoint.ThisTimeLineID;
+		SpinLockRelease(&xlogctl->info_lck);
 
 		RemoveOldXlogFiles(_logId, _logSeg, endptr);
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to