Re: [GENERAL] Postgres Clustering Options

John R Pierce Wed, 11 Nov 2009 11:06:38 -0800

Greg Smith wrote:

It sounds like you've got the basics nailed down here and are on awell trod path, just one not one documented publicly very well. Sinceyou said that even DRBD was too much overhead for you, I think a diveinto evaluating the commercial clustering approaches (or the freeLinuxHA that RedHat's is based on, which I haven't been real impressedby) would be appropriate. The hard part is generally getting aheartbeat between the two servers sharing the SAN that is bothsensitive enough to catch failures while not being so paranoid that itfails over needlessly (say, when load spikes on the primary and itslows down). Make sure you test that part out very carefully with anyvendor you evaluate.


hence the 'multiple dedicated heartbeat networks' previously suggested.

a typical cluster server has a quad ethernet, 2 ports (802.3ad linkaggregation w/ failover) for the LAN, and 2 dedicated for the heartbeat,then a dual HBA for the SAN. the heartbeats can run over crossovercables, even 10baseT is plenty as the traffic volume is quite low, itjust needs low latency and no possibility of congestion.

I setup the RHCS aka CentOS Cluster in a test lab environment... itseemed to work well enough. I was using FC storage via a QLogic SANbox5600 switch, which was supported by RHCS as a fencing device...Note that ALL of the storage used by the cluster servers on the SANshould be under cluster management as the 'standby' server won't see anyof it when its fenced (I implemented fencing via FC port disable).This is can be an issue when you want to do rolling upgrades (update thestandby server, force a failover, update the previous master). Ibuilt each cluster node with its own direct attached mirrored storagefor the OS and software.

As far as the PostgreSQL specifics go, you need a solid way to ensureyou've disconnected the now defunct master from the SAN (the classic"shoot the other node in the head" problem). All you *should* have todo is start the database again on the backup after doing that. Thatwill come up as a standard crash, run through WAL replay crashrecovery, and the result should be no different than had you restartedafter a crash on the original node. The thing you cannot let happenis allowing the original master to continue writing to the shared SANvolume once that transition has happened.


which is what 'storage fencing' prevents.



--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres Clustering Options

Reply via email to