[Linux-HA] HA advice

Gary Schlachter Wed, 16 Jan 2008 11:47:15 -0800

Since I have been fighting with what I thought would be a simple HAimplementation for too long now, I am soliciting advice to see if I needto go in a completely different direction. My basic HA environment isfor a cluster of servers to provide a single resource (an IP address) todevices external to the cluster of servers. Thus, if the server whichis currently configured with the IP address fails or loses networkconnectivity, another member of the cluster will provide the IP addressto the outside world. The members in the cluster would monitor eachother via broadcast eth0 or eth1.Sound reasonable?

The problem begins with the fact that I need to provide thiscapability across servers (members) running Fedora Core 1 and acrossservers (members) running Fedora Core 7. I am currently trying to getthe Fedora Core 1 server cluster to work in the above scenario. Aftertrying different versions of HA experiencing varying problems, I finallygot 2.1.3 to build on FC1 with a few hacks thrown in. After puttingthe RPMs on two servers and starting Heartbeat on each, I added a singleIP address resource and a constraint that it should primarily exist onone of the two servers. So far, so good. When I pull the eth0 cable onthe server "owning" the IP address, the other member of the clustercorrectly picks up the condition and begins to provide the resource (IPaddress). When I plug the eth0 cable back in, the resource (IP address)switches back to the server which is now back on the network. However,now heartbeat is continually logging the message:

WARN: Gmain_timeout_dispatch: Dispatch function for retransmitrequest took too long to execute: 40 ms (>10ms) (Gsource: ...)


   These log entries appear at a rate of approximately 15 per second.

   So here are my questions:

Is my scenario reasonable for HA? Or is HA overkill and there issome other software which would fit better? Is building 2.1.3 on FC1the right way to go or is there an older stable version of HA whichwould work for my simple scenario? (BTW, I tried 2.0.8 in thisscenario, and heartbeat died when I disconnected eth0) If 2.1.3 is thecorrect answer, what do I do about the above logging problem?

It is possible that I am not configuring my simple scenariocorrectly which may be causing me problems in the above scenario. Alladvice is welcome.


Thanks,
Gary

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] HA advice

Reply via email to