Re: [GENERAL] Warm standby: 1 to N

Yar Tikhiy Tue, 07 Jul 2009 22:18:28 -0700

On Tue, Jun 02, 2009 at 02:52:26PM -0400, Bruce Momjian wrote:

Yaroslav Tykhiy wrote:

Hi All,


Let's consider the following case: WAL segments from a master have
been shipped to N warm standby servers, and now the master fails.
Using this or that mechanism, one of the warm standbys takes over and

becomes the new master. Now the question is what to do with theother

N-1 warm standbys.  By the failure, all N warm standbys were the same
exact copies of the master.  So at least in theory, the N-1 warm
standbys left can be fed with WAL segments from the new master.  Do
you think it will work in practice?  Are there any pitfalls?


I think it should work.

Bruce, thank you a lot for the encouragement! I had a chance to go astep further and fail over to a warm stand-by server without losing asinge transaction. Now I'm happy to share my experience with thecommunity.

The initial setup was as follows: Server A was the master, servers Band C were warm stand-bys. The task was to fail over from A to B in acontrolled manner whilst keeping C running as a warm stand-by.


Both B and C were initially running with archive_command set as follows:

archive_command='/some/path/archive.sh "%p" "%f"'

where archive.sh contained just "exit 1". So a real archive scriptcould be atomically mv'ed in place later without losing any WALsegments. (Note that the archiver process is supposed to queuesegments and keep retrying as long as the archive command is exitingwith a non-zero status.)

After making sure B and C were keeping up with A, the latter was shutdown. Then the last, incomplete WAL segment NNN was manually copiedfrom A (pg_controldata was useful to find its name) to B's WALshipping spool for the restore script to pick it up.

B processed segment NNN and, upon reaching its logical end, exitedrecovery mode. At this moment all the clients were switched over toB. Now the master, B continued writing its transaction log to segmentNNN, filling it up and moving on to the next segment NNN+1.

(On the one hand, it was quite unexpected that B didn't move on to anew timeline upon exiting recovery mode. On the other hand, had itdone so, the whole trick would have been impossible. Please correctme if I'm wrong. Just in case, the Postgresql version was 8.0.6.Yes, it's ancient and sorely needs an upgrade.)

Now segment NNN was full and contained both the last transactions fromA and the first transactions from B. It was time to ship NNN from Bto C in order to bring C in line with B -- without disrupting C'srecovery mode. A real archive script was substituted for the dummyscript on B. At the next retry the script shipped segment NNN to Cand so the WAL shipping train got going B->C.

A possible pitfall to watch out for is this: If the WAL shipping spoolis shared between B and C, e.g., NFS based, just copying segment NNNto it will make both B and C exit recovery mode. To avoid that, atleast in theory, segment NNN can be copied directly into B's pg_xlogand then B's restore command needs to be signalled to return a non-zero status. According to the manual, the recovery process issupposed to look in pg_xlog as a final resort in case the restorecommand returned an error status. However, I didn't try that as I hadseparate, local WAL spools on B and C.


Hoping all this stuff helps somebody...

Yar

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Warm standby: 1 to N

Reply via email to