Ron Kerry <rke...@...> writes: > I am seeing the following sequence of messages with every monitor interval for my stonith resource. > > Sep 28 10:44:01 genesis stonith-ng: [9493]: ERROR: run_stonith_agent: No timeout set for stonith > operation monitor with device fence_legacy > Sep 28 10:44:01 genesis stonith: l2network device OK. > > It is unclear to me what this ERROR means as the resource itself says everything is fine. There is a > monitor timeout set in the resource definition. > > Distribution is SLES11SP1 (SLE11SP1-HAE). > cluster-glue 1.0.6-0.3.7
I'm seeing the same problem ever since the latest update rollup from Novell (the "sleshasp1-ha-update-201009" patch). Example: Sep 29 16:28:35 imsxen3 stonith-ng: [5182]: ERROR: run_stonith_agent: No timeout set for stonith operation monitor with device fence_legacy Sep 29 16:28:36 imsxen3 stonith: external/ipmi device OK. I downgraded the cluster-glue package (and a couple others, so RPM dependencies were still satisfied) on one machine and the messages went away on that machine, while they're still there on the others. To clarify -- the "no timeout set" error is logged on the machine the stonith resource is currently running on, each time the monitor operation fires. On the machine I downgraded cluster-glue on, there are no such errors for any stonith resource running on that server. My stonith definitions (in "crm configure" format) are like this: primitive stonith-imsxen1 stonith:external/ipmi \ meta target-role="Started" \ operations $id="stonith-imsxen2-operations" \ op monitor interval="300" timeout="15" start-delay="15" \ params hostname="imsxen1" ipaddr="10.95.12.51" userid="stonith" passwd="XXXX" interface="lanplus" and similarly for stonith-imsxen2 and stonith-imsxen3. (Node names are imsxen[123].) STONITH works properly, aside from the annoying messages with the latest version. Here is the RPM version comparison: v | SLE11-HAE-SP1-Updates | cluster-glue | 1.0.5-0.5.1 | 1.0.6-0.3.7 | x86_64 v | SLE11-HAE-SP1-Updates | libglue2 | 1.0.5-0.5.1 | 1.0.6-0.3.7 | x86_64 v | SLE11-HAE-SP1-Updates | libpacemaker3 | 1.1.2-0.2.1 | 1.1.2-0.6.1 | x86_64 v | SLE11-HAE-SP1-Updates | pacemaker | 1.1.2-0.2.1 | 1.1.2-0.6.1 | x86_64 v | SLE11-HAE-SP1-Updates | pacemaker-mgmt | 2.0.0-0.2.19 | 2.0.0-0.3.10 | x86_64 I intentionally rolled back the cluster-glue package, and the others were rolled back to satisfy dependencies. According to the RPM changelog, the "good" version of cluster-glue (1.0.5-0.5.1) is from Upstream version cs: 6cf2e36df9f4, while the newer one is from cs: a146a145a3e. While it's possible this is a problem with Novell's builds, I don't think that to be likely, since there are no local patches in the RPM spec file. _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker