Since I have been fighting with what I thought would be a simple HA
implementation for too long now, I am soliciting advice to see if I need
to go in a completely different direction. My basic HA environment is
for a cluster of servers to provide a single resource (an IP address) to
devices external to the cluster of servers. Thus, if the server which
is currently configured with the IP address fails or loses network
connectivity, another member of the cluster will provide the IP address
to the outside world. The members in the cluster would monitor each
other via broadcast eth0 or eth1.
Sound reasonable?
The problem begins with the fact that I need to provide this
capability across servers (members) running Fedora Core 1 and across
servers (members) running Fedora Core 7. I am currently trying to get
the Fedora Core 1 server cluster to work in the above scenario. After
trying different versions of HA experiencing varying problems, I finally
got 2.1.3 to build on FC1 with a few hacks thrown in. After putting
the RPMs on two servers and starting Heartbeat on each, I added a single
IP address resource and a constraint that it should primarily exist on
one of the two servers. So far, so good. When I pull the eth0 cable on
the server "owning" the IP address, the other member of the cluster
correctly picks up the condition and begins to provide the resource (IP
address). When I plug the eth0 cable back in, the resource (IP address)
switches back to the server which is now back on the network. However,
now heartbeat is continually logging the message:
WARN: Gmain_timeout_dispatch: Dispatch function for retransmit
request took too long to execute: 40 ms (>10ms) (Gsource: ...)
These log entries appear at a rate of approximately 15 per second.
So here are my questions:
Is my scenario reasonable for HA? Or is HA overkill and there is
some other software which would fit better? Is building 2.1.3 on FC1
the right way to go or is there an older stable version of HA which
would work for my simple scenario? (BTW, I tried 2.0.8 in this
scenario, and heartbeat died when I disconnected eth0) If 2.1.3 is the
correct answer, what do I do about the above logging problem?
It is possible that I am not configuring my simple scenario
correctly which may be causing me problems in the above scenario. All
advice is welcome.
Thanks,
Gary
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems