On 02/13/2014 02:42 PM, Heikki Linnakangas wrote:
The behavior where we prefer a segment from archive with lower TLI over
a file with higher TLI in pg_xlog actually changed in commit
a068c391ab0. Arguably changing it wasn't a good idea, but the problem
your test script demonstrates can be fixed by not archiving the partial
segment, with no change to the preference of archive/pg_xlog. As
discussed, archiving a partial segment seems like a bad idea anyway, so
let's just stop doing that.

After some further thought, while not archiving the partial segment fixes your test script, it's not enough to fix all variants of the problem. Even if archive recovery doesn't archive the last, partial, segment, if the original master server is still running, it's entirely possible that it fills the segment and archives it. In that case, archive recovery will again prefer the archived segment with lower TLI over the segment with newer TLI in pg_xlog.

So I agree we should commit the patch you posted (or something to that effect). The change to not archive the last segment still seems like a good idea, but perhaps we should only do that in master.

Even if after that patch, you can have a problem in more complicated scenarios involving both an archive and streaming replication. For example, imagine a timeline history like this:

TLI

1 ----+--------------------------->
      |
2     +--------------------------->


Now imagine that timeline 1 has been fully archived, and there are WAL segments much higher than the points where the timeline switch occurred present in the archive. But none of the WAL segments for timeline 2 have been archived; they are only present in a master server. You want to perform recovery to timeline 2, using the archived WAL segments for timelines 1, and streaming replication to catch up to the tip of timeline 2.

Whether we prefer files from pg_xlog or archive will make no difference in this case, as there are no files in pg_xlog. But it will merrily apply all the WAL for timeline 1 from the archive that it can find, past the timeline switch point. After that, when it tries to connect to the server will streaming replication, it will fail.

There's not much we can do about that in 9.2 and below, but in 9.3 the timeline history file contains the exact timeline switch points, so we could be more careful and not apply any extra WAL on the old timeline past the switch point. We could also be more exact in which files we try to restore from the archive, instead of just polling every future TLI in the history.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to