Hi Rick,

at the moment I´m building the same setup than you. I have no further 
experience with it, but I made a setup in our testing lab and under testing 
conditions it seems to run quite nice. 

I took 2 servers with heartbeat1 in active/passive node. Each server has its 
own IP, and they have a cluster IP that´s managed by heartbeat only. This 
cluster IP is provided in our DNS for accessing the mailstorage cluster, and 
only the active node has it at the time.

Then I have a DRBD shared storage on the two nodes.
On the DRBD storage I only put the dovecot maildir and mysql databases. The 
dovecot and mysql binaries are not shared and the configuration also not.

DRBD, dovecot and Mysql are managed by heartbeart.

There is always a danger that the connection between the 2 nodes is failing and 
you will have a "split brain" then with a big data mess. So it´s important to 
provide redundancy in the connections. 
For heartbeat, I have one dedicated LAN connection and a serial connection.
For DRBD, I use 2 bonded NICs on different PCI cards.
Take a look at DOPD for DRBD. This marks the passive DRBD partition "outdated" 
if the DRBD connection fails, and because heartbeat can only takeover if it can 
start all resources of a resource group, a failover is not possible anymore if 
the DRBD connection is broken, so you can´t mess up your DRBD so easy any more.

If both heartbeat connections fail, you will have lots of trouble, and that´s 
easy to achieve with some wrong iptables if you take only LAN connections. So 
the serial cable is a nice thing because it´s not affected!

We use heartbeat1 because we had some trouble bringing heartbeat2 to run. 
Heartbeat1 is not able to monitor it´s resources, so we thought about using MON 
for this. And to take some STONITH devices like telnet accessible power outlets 
to switch off the power of a failing node automatically. But this setup seems 
to be rather complex, which is the enemy of reliability, and we heard about 
people having problems with accidently automatic failovers or reboots. So in 
the end we decided against an automatic failover in the case a service dies. We 
use only the failover of heartbeat1, e.g. if the active node dies completely, 
there will be a failover to the passive node. And we use connection redundancy 
to hopefully not have a split brain. And make a good backup ;-)

(Take care not to use NFS for storage if you take another setup than the here 
described because you can have trouble with file locking!)

Our cluster is protecting against hardware problems, and against some kind of 
software problems. Because of DRBD, if you do a "rm -rf" on the maildir, you 
loose all data on _both_ nodes in the same second, so the protection against 
administration faults is not very good! Backups are really important.
But if we have some trouble with the active node, and we can´t fix it in some 
minutes, we can try a failover to the passive node and there is a big chance 
that the service is running on the other node quite well. A nice things for 
software updates.

For MTA we use Postfix. Because it´s not a good idea to put the postfix 
mailqueue on a DRBD (bad experiences), you will have  some mails (temporarily) 
lost if you do a failover. So it´s a good idea to minimize the time mails are 
held in the queue. Because of this and because we need a longtime stable 
mailstorage but an always up-to-date brand new SPAM and virus filter, we 
decided to put 2 Postfix/Amavis/Spamassassin/Antivirus relays in front of the 
IMAP cluster. They´re identical, with the same MX priority in DNS, so if one of 
the relays fails, the other one takes the load.

As I said, this solution is working only in the lab now and not yet in 
production, but there the failover seems to be no problem at all for the 
clients. So I hope I could give you some ideas.

regards,

  Andreas 
-- 
Nur bis 31.05.: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate und
Telefonanschluss nur 17,95 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02

Reply via email to