On Tue, Jun 02, 2009 at 02:52:26PM -0400, Bruce Momjian wrote:
Yaroslav Tykhiy wrote:
Hi All,

Let's consider the following case: WAL segments from a master have
been shipped to N warm standby servers, and now the master fails.
Using this or that mechanism, one of the warm standbys takes over and
becomes the new master. Now the question is what to do with the other
N-1 warm standbys.  By the failure, all N warm standbys were the same
exact copies of the master.  So at least in theory, the N-1 warm
standbys left can be fed with WAL segments from the new master.  Do
you think it will work in practice?  Are there any pitfalls?

I think it should work.

Bruce, thank you a lot for the encouragement! I had a chance to go a step further and fail over to a warm stand-by server without losing a singe transaction. Now I'm happy to share my experience with the community.

The initial setup was as follows: Server A was the master, servers B and C were warm stand-bys. The task was to fail over from A to B in a controlled manner whilst keeping C running as a warm stand-by.

Both B and C were initially running with archive_command set as follows:

archive_command='/some/path/archive.sh "%p" "%f"'

where archive.sh contained just "exit 1". So a real archive script could be atomically mv'ed in place later without losing any WAL segments. (Note that the archiver process is supposed to queue segments and keep retrying as long as the archive command is exiting with a non-zero status.)

After making sure B and C were keeping up with A, the latter was shut down. Then the last, incomplete WAL segment NNN was manually copied from A (pg_controldata was useful to find its name) to B's WAL shipping spool for the restore script to pick it up.

B processed segment NNN and, upon reaching its logical end, exited recovery mode. At this moment all the clients were switched over to B. Now the master, B continued writing its transaction log to segment NNN, filling it up and moving on to the next segment NNN+1.

(On the one hand, it was quite unexpected that B didn't move on to a new timeline upon exiting recovery mode. On the other hand, had it done so, the whole trick would have been impossible. Please correct me if I'm wrong. Just in case, the Postgresql version was 8.0.6. Yes, it's ancient and sorely needs an upgrade.)

Now segment NNN was full and contained both the last transactions from A and the first transactions from B. It was time to ship NNN from B to C in order to bring C in line with B -- without disrupting C's recovery mode. A real archive script was substituted for the dummy script on B. At the next retry the script shipped segment NNN to C and so the WAL shipping train got going B->C.

A possible pitfall to watch out for is this: If the WAL shipping spool is shared between B and C, e.g., NFS based, just copying segment NNN to it will make both B and C exit recovery mode. To avoid that, at least in theory, segment NNN can be copied directly into B's pg_xlog and then B's restore command needs to be signalled to return a non- zero status. According to the manual, the recovery process is supposed to look in pg_xlog as a final resort in case the restore command returned an error status. However, I didn't try that as I had separate, local WAL spools on B and C.

Hoping all this stuff helps somebody...

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to