Hi list! I've got a question about failover conditions for my two-node Heartbeat/DRBD/NFS system. I've already searched the list archives and can't seem to find a definitive answer to my question.
We're using Heartbeat V1, we're not using Stonith. We get around split-brain recovery by bringing up all the services in an "off" state and manually turning every thing back on. Our failure methodology is tolerant of simply making sure the initial failover is automatic, and the rest of the work can be done by meatware. The configuration seems to work fine, and we can successfully fail over with disaster simulation or simply shutting down heartbeat. To clarify- we can pull the plug on the active unit, and the secondary takes over with no problem. The problem is this: we've had a couple failure conditions where NFS became unavailable but the server was still network-visible and heartbeat did not register an outage. Here's my question- is there a way to make Heartbeat V1 do service tests instead of pinging to determine system health? Do I have to go to V2 and CRM? Here's some configs. Let me know If there's more I can provide that will help. Thanks in advance! Bond0 is the network serving up the NFS data Bond1 is the network DRBD syncs over. .60 and .110 are node1 .61 and .111 are node2 ________________ deadtime 15 keepalive 5 warntime 6 logfacility local6 ucast bond0 192.168.101.60 ucast bond1 10.143.254.110 ucast bond0 192.168.101.61 ucast bond1 10.143.254.111 debug 1 auto_failback off node node1.dmz.domain.local node node2.dmz.domain.local _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
