> > And indeed, the cluster does come up - without a node. A more accurate > summation is that "a single node in the cluster doesn't come up". So, > the _cluster_ does recover from this error. It just does it without > that node. So, service is not interrupted. >
At the end of the day then, I think my problem comes down to the fact that I am not using static IP addresses for the NICs -- I know you consider the use of DHCP (and also, I would guess zeroconf) addresses a bad thing - however, consider the case where one is trying to automate the cluster config/setup - in this case, the actual IP addresses used for the NIC are completely irrelevant to anyone other than the hb code (because users of the cluster should ONLY be using the cluster alias address). If you use DHCP/Zeroconf then if a NIC does not have link at boot time, it will not get an address assigned and HB will refuse to start with this error: Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: glib: Get broadcast for interface eth1 failed: Cannot assign requested address Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: glib: IP interface [eth1] does not exist Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: Illegal bcast [UDP/IP broadcast] in config file [eth1] Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: Heartbeat not started: configuration error. Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: Configuration error, heartbeat not started. This actually can lead to HB not starting anywhere (consider the case of a two node cluster with a direct cable connect for one of the NICs -- if one node is powered off, then the other one will not have link on the NIC and therefore will not assign an address) I'd be interested in more discussion on why DHCP/Zeroconf is considered anathema. I'd also be interested in knowing if anyone is working on supporting IP V6 broadcast/multicast for the hb comms links (in which case a static address can be allocated with no configuration required) > This is the rationale for this behavior. It's not perfect behavior, > but > it's not completely irrational either... > > -- > Alan Robertson <[EMAIL PROTECTED]> Thanks for the explanation - it helps a lot and is exactly what I was looking for. Simon _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/