Hi I have noticed this happening a few times on various of my clusters. The monitor operation for some resources stops running, and thus resource failures are not detected. If I edit the cib, and change something regarding the resource (generally I change the monitor interval), the resource starts monitoring again, detects the failure and restarts correctly
I am using pacemaker 1.0.9 live, and 1.0.10 in test. This has happened with both clone and non-clone resources. I have attached a log which shows the behaviour. I have a resource (megaswitch) running cloned over 6 nodes. Until 06:48:22, the monitor is running correctly (the app logs the "Deleting context for MONTEST-" line when the monitor is run) After that, the monitor is not run again on this node I have the logs for the other nodes, if they are needed to try and debug this. -- Chris Picton Executive Manager - Systems ECN Telecommunications (Pty) Ltd t: 010 590 0031 m: 079 721 8521 f: 087 941 0813 e: ch...@ecntelecoms.com "Lowering the cost of doing business"
<<attachment: Signature-logo.gif>>
log.txt.gz
Description: GNU Zip compressed data
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker