> 
> And indeed, the cluster does come up - without a node.  A more
accurate
> summation is that "a single node in the cluster doesn't come up".  So,
> the _cluster_ does recover from this error.  It just does it without
> that node.  So, service is not interrupted.
> 

At the end of the day then, I think my problem comes down to the fact
that I am not using static IP addresses for the NICs -- I know you
consider the use of DHCP (and also, I would guess zeroconf) addresses a
bad thing - however, consider the case where one is trying to automate
the cluster config/setup - in this case, the actual IP addresses used
for the NIC are completely irrelevant to anyone other than the hb code
(because users of the cluster should ONLY be using the cluster alias
address).

If you use DHCP/Zeroconf then if a NIC does not have link at boot time,
it will not get an address assigned and HB will refuse to start with
this error:

Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: glib: Get
broadcast for interface eth1 failed: Cannot assign requested address 
Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: glib: IP
interface [eth1] does not exist 
Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: Illegal
bcast [UDP/IP broadcast] in config file [eth1] 
Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR: Heartbeat
not started: configuration error.
Oct 17 05:41:47 heartbeat[10189]: 2007/10/17_05:41:49 ERROR:
Configuration error, heartbeat not started.

This actually can lead to HB not starting anywhere (consider the case of
a two node cluster with a direct cable connect for one of the NICs -- if
one node is powered off, then the other one will not have link on the
NIC and therefore will not assign an address)

I'd be interested in more discussion on why DHCP/Zeroconf is considered
anathema.

I'd also be interested in knowing if anyone is working on supporting IP
V6 broadcast/multicast for the hb comms links (in which case a static
address can be allocated with no configuration required)

 
> This is the rationale for this behavior.  It's not perfect behavior,
> but
> it's not completely irrational either...
> 
> --
>     Alan Robertson <[EMAIL PROTECTED]>

Thanks for the explanation - it helps a lot and is exactly what I was
looking for.
Simon
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to