Since I have been fighting with what I thought would be a simple HA implementation for too long now, I am soliciting advice to see if I need to go in a completely different direction. My basic HA environment is for a cluster of servers to provide a single resource (an IP address) to devices external to the cluster of servers. Thus, if the server which is currently configured with the IP address fails or loses network connectivity, another member of the cluster will provide the IP address to the outside world. The members in the cluster would monitor each other via broadcast eth0 or eth1. Sound reasonable?

The problem begins with the fact that I need to provide this capability across servers (members) running Fedora Core 1 and across servers (members) running Fedora Core 7. I am currently trying to get the Fedora Core 1 server cluster to work in the above scenario. After trying different versions of HA experiencing varying problems, I finally got 2.1.3 to build on FC1 with a few hacks thrown in. After putting the RPMs on two servers and starting Heartbeat on each, I added a single IP address resource and a constraint that it should primarily exist on one of the two servers. So far, so good. When I pull the eth0 cable on the server "owning" the IP address, the other member of the cluster correctly picks up the condition and begins to provide the resource (IP address). When I plug the eth0 cable back in, the resource (IP address) switches back to the server which is now back on the network. However, now heartbeat is continually logging the message:

WARN: Gmain_timeout_dispatch: Dispatch function for retransmit request took too long to execute: 40 ms (>10ms) (Gsource: ...)

   These log entries appear at a rate of approximately 15 per second.

   So here are my questions:

Is my scenario reasonable for HA? Or is HA overkill and there is some other software which would fit better? Is building 2.1.3 on FC1 the right way to go or is there an older stable version of HA which would work for my simple scenario? (BTW, I tried 2.0.8 in this scenario, and heartbeat died when I disconnected eth0) If 2.1.3 is the correct answer, what do I do about the above logging problem?

It is possible that I am not configuring my simple scenario correctly which may be causing me problems in the above scenario. All advice is welcome.

Thanks,
Gary

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to