On 4/2/2011 12:40 AM, Vadym Chepkov wrote:

> Ok, lets see how this might work.
> You would need a separate monitor for the cluster and since this
> monitor also can potentially crash, you would need another monitor to
> observer the first one, then we would want the first one to monitor
> second one, so we would need a cluster of monitors.

That is precisely why I'm happy with heartbeat 2.1.4 in R1 setup: 
simple, stupid, and I know exactly what failures it will handle and what 
problems it monitors for (because I wrote the mon scripts).

> Wait, don't we have already cluster in place? It seems logical to have
> monitor to be part of the cluster. I was expecting "monitor" operation
> to handle that, but it seems for DRBD this is not the case.

This is also not the case with e.g. apache once you think about it: the 
agent checks it wget of /server-status on locahost returns success. 
There's 3 things wrong with that, the one relevant here is that kernel 
should be smart enough to route the packets over lo even if you're 
wget'ting from cluster ip. As a result you cannot check if a daemon is 
answering on cluster ip if you run the check on active node.

So you have to have an external monitor. Don't you have one to monitor 
your switches and upsen and not-clustered kit anyway?

> Maybe  we
> should  have another primitive running? drbd_status or something?
> When drbd subsystem is in degraded state, have drbd_status in "stopped" state?

Drbd has its own logic for figuring out its state. Controlled via 
drbd.conf -- adjust drbd.conf so the secondary does not start in 
degraded state. And shuts down when split brain is detected.

Dima


Dima
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to