Hi,

What is a best way to check if PM is still alive?

We tried following approach: there is a softdog timer (max value is
300s + extra 60s to give PM another chance) initially started and
checked by third party. Clone named HA_alive fails in monitor (except
first time), monitor interval is 200s. HA_alive:start should reset
that softdog timer. It looks like sometimes PM doesn't restart failed
resource for that 360s with no reason: system is almost IDLE.
Another approach we used was based on "crmadmin -S this_node" && start
timer if any problems && try to compare "crm resource status" at
different time to see that something happens on system (PM works and
bad result of crmadmin -S caused by high load of PM). It doesn't work
fine either.

-- 
Evgeniy Ivanov

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to