Hello All, I would appreciate if you could help me on this problem I am facing with Apache HA with HB and MON.
I have been working on setting up 2 node failover cluster for my web service. I have installed the heartbeat 2.0.5 amd MON on the 2 SUSE Linux servers. The MON is monitoring the Apache webserver. I tested two methods of causing failover and then a failback. I end up having a split brain in the cluster in Method 1. Method 1: I find that SLAVENODE takes all the resource if I stop the heartbeat of the MASTERNODE by running 'rcheartbeat stop', this is quite normal. But If I do 'rcheartbeat start' on the MASTERNODE again to restart heartbeat, the MASTERNODE thinks the SLAVENODE is dead and takes over the resources ending up in a unrecoverable split-brain. Method 2: Suprisingly, If I had caused the failover by pulling off the network cable and the restored back the network cable followed by starting the heartbeat again on the MASTERNODE, I see that MASTERNODE senses the SLAVENODE, SLAVENODE relinquishes resources to MASTER and it seems all fine. I am not able to get why the Method-1 of failover is ending up with a split brain. My ha.cf and haresource are as below. debug 1 logfile /var/log/ha-log keepalive 2 warntime 30 deadtime 80 initdead 90 node MASTERNODE node SLAVENODE bcast eth0 udpport 694 auto_failback on ping_group ping-cluster-test 10.10.10.1 10.10.10.151 respawn hacluster /usr/lib/heartbeat/ipfail crm off Also attached are the master and slave dump when split brain occurs in Method-1. It would be great to get your solutios to this. Regards Shailesh P Shirali _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
