David Kerr wrote:
The apps actually aren't as robust as the DB in this case, so i'll have time to
replay all of the logs that made it before "the big one" while those are being
configured to come up. and if it does take longer that's not a huge issue
i'll have a few hours to get 100% caught up.
It sounds like you've got the basics nailed down here and are on a well
trod path, just one not one documented publicly very well. Since you
said that even DRBD was too much overhead for you, I think a dive into
evaluating the commercial clustering approaches (or the free LinuxHA
that RedHat's is based on, which I haven't been real impressed by) would
be appropriate. The hard part is generally getting a heartbeat between
the two servers sharing the SAN that is both sensitive enough to catch
failures while not being so paranoid that it fails over needlessly (say,
when load spikes on the primary and it slows down). Make sure you test
that part out very carefully with any vendor you evaluate.
As far as the PostgreSQL specifics go, you need a solid way to ensure
you've disconnected the now defunct master from the SAN (the classic
"shoot the other node in the head" problem). All you *should* have to
do is start the database again on the backup after doing that. That
will come up as a standard crash, run through WAL replay crash recovery,
and the result should be no different than had you restarted after a
crash on the original node. The thing you cannot let happen is allowing
the original master to continue writing to the shared SAN volume once
that transition has happened.
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com www.2ndQuadrant.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general