Hi

I have noticed this happening a few times on various of my clusters.
The monitor operation for some resources stops running, and thus
resource failures are not detected.  If I edit the cib, and change
something regarding the resource (generally I change the monitor
interval), the resource starts monitoring again, detects the failure and
restarts correctly

I am using pacemaker 1.0.9 live, and 1.0.10 in test.

This has happened with both clone and non-clone resources.

I have attached a log which shows the behaviour.  I have a resource
(megaswitch) running cloned over 6 nodes.

Until 06:48:22, the monitor is running correctly (the app logs the
"Deleting context for MONTEST-" line when the monitor is run)
After that, the monitor is not run again on this node

I have the logs for the other nodes, if they are needed to try and debug
this.

-- 
Chris Picton

Executive Manager - Systems
ECN Telecommunications (Pty) Ltd
t:   010 590 0031 m: 079 721 8521
f:   087 941 0813
e:  ch...@ecntelecoms.com

"Lowering the cost of doing business"

<<attachment: Signature-logo.gif>>

Attachment: log.txt.gz
Description: GNU Zip compressed data

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to