David Kerr wrote:
The apps actually aren't as robust as the DB in this case, so i'll have time to
replay all of the logs that made it before "the big one" while those are being
configured to come up. and if it does take longer that's not a huge issue
i'll have a few hours to get 100% caught up.
It sounds like you've got the basics nailed down here and are on a well trod path, just one not one documented publicly very well. Since you said that even DRBD was too much overhead for you, I think a dive into evaluating the commercial clustering approaches (or the free LinuxHA that RedHat's is based on, which I haven't been real impressed by) would be appropriate. The hard part is generally getting a heartbeat between the two servers sharing the SAN that is both sensitive enough to catch failures while not being so paranoid that it fails over needlessly (say, when load spikes on the primary and it slows down). Make sure you test that part out very carefully with any vendor you evaluate. As far as the PostgreSQL specifics go, you need a solid way to ensure you've disconnected the now defunct master from the SAN (the classic "shoot the other node in the head" problem). All you *should* have to do is start the database again on the backup after doing that. That will come up as a standard crash, run through WAL replay crash recovery, and the result should be no different than had you restarted after a crash on the original node. The thing you cannot let happen is allowing the original master to continue writing to the shared SAN volume once that transition has happened.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to