Hi, On Thu, Dec 09, 2010 at 02:58:33PM +0300, Evgeniy Ivanov wrote: > On Thu, Dec 9, 2010 at 2:32 PM, Andrew Beekhof <and...@beekhof.net> wrote: > > On Thu, Dec 9, 2010 at 12:14 PM, Evgeniy Ivanov <lolkaanti...@gmail.com> > > wrote: > >> Hi, > >> > >> What is a best way to check if PM is still alive? > > > > "ps axf | grep crmd" is one approach > > It just means that crmd is alive, but doesn't give information about > its state, e.g. theoretically it can hang in some internal logic > (something like "endless loop"). So we need something to ask "Hey, > PM! Are your brains still OK?".
The closest is "crmadmin -D" followed by "crmadmin -S" to check the status of the DC node. Or crmadmin -S on all nodes. Thanks, Dejan > >> We tried following approach: there is a softdog timer (max value is > >> 300s + extra 60s to give PM another chance) initially started and > >> checked by third party. Clone named HA_alive fails in monitor (except > >> first time), monitor interval is 200s. HA_alive:start should reset > >> that softdog timer. It looks like sometimes PM doesn't restart failed > >> resource for that 360s with no reason: system is almost IDLE. > > > > Strange. Should work. Details? > > It's dual-node cluster based on openais-0.80.3-26.1 and > pacemaker-1.0.3-4.1. Solution I've described worked fine on my > cluster, but regularly failed without a reason on some another > clusters. The logs (/var/log/messages) say, that PM noticed a failure > in monitor, but later it didn't restart (no stop and no start) the > HA_alive resource, thus in 360s system died. I didn't notice anything > else in logs... > I will be able to share some /var/log/messages, if I get access to > failed clusters. > > > -- > Evgeniy Ivanov > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker