On Wed, Sep 29, 2010 at 11:57 PM, Andrew Daugherity <adaugher...@tamu.edu> wrote: > Ron Kerry <rke...@...> writes: >> I am seeing the following sequence of messages with every monitor interval >> for > my stonith resource. >> >> Sep 28 10:44:01 genesis stonith-ng: [9493]: ERROR: run_stonith_agent: No > timeout set for stonith >> operation monitor with device fence_legacy >> Sep 28 10:44:01 genesis stonith: l2network device OK. >> >> It is unclear to me what this ERROR means as the resource itself says > everything is fine. There is a >> monitor timeout set in the resource definition. >> >> Distribution is SLES11SP1 (SLE11SP1-HAE). >> cluster-glue 1.0.6-0.3.7 > > I'm seeing the same problem ever since the latest update rollup from Novell > (the > "sleshasp1-ha-update-201009" patch). Example: > Sep 29 16:28:35 imsxen3 stonith-ng: [5182]: ERROR: run_stonith_agent: No > timeout > set for stonith operation monitor with device fence_legacy > Sep 29 16:28:36 imsxen3 stonith: external/ipmi device OK.
I believe its been fixed upstream, I guess novell needs to apply the other half of the patch. > > I downgraded the cluster-glue package (and a couple others, so RPM > dependencies > were still satisfied) on one machine and the messages went away on that > machine, > while they're still there on the others. > > To clarify -- the "no timeout set" error is logged on the machine the stonith > resource is currently running on, each time the monitor operation fires. On > the > machine I downgraded cluster-glue on, there are no such errors for any stonith > resource running on that server. > > My stonith definitions (in "crm configure" format) are like this: > primitive stonith-imsxen1 stonith:external/ipmi \ > meta target-role="Started" \ > operations $id="stonith-imsxen2-operations" \ > op monitor interval="300" timeout="15" start-delay="15" \ > params hostname="imsxen1" ipaddr="10.95.12.51" userid="stonith" > passwd="XXXX" > interface="lanplus" > and similarly for stonith-imsxen2 and stonith-imsxen3. (Node names are > imsxen[123].) > > STONITH works properly, aside from the annoying messages with the latest > version. > > Here is the RPM version comparison: > v | SLE11-HAE-SP1-Updates | cluster-glue | 1.0.5-0.5.1 | > 1.0.6-0.3.7 | x86_64 > v | SLE11-HAE-SP1-Updates | libglue2 | 1.0.5-0.5.1 | > 1.0.6-0.3.7 | x86_64 > v | SLE11-HAE-SP1-Updates | libpacemaker3 | 1.1.2-0.2.1 | > 1.1.2-0.6.1 | x86_64 > v | SLE11-HAE-SP1-Updates | pacemaker | 1.1.2-0.2.1 | > 1.1.2-0.6.1 | x86_64 > v | SLE11-HAE-SP1-Updates | pacemaker-mgmt | 2.0.0-0.2.19 | > 2.0.0-0.3.10 | x86_64 > > I intentionally rolled back the cluster-glue package, and the others were > rolled > back to satisfy dependencies. According to the RPM changelog, the "good" > version of cluster-glue (1.0.5-0.5.1) is from Upstream version cs: > 6cf2e36df9f4, > while the newer one is from cs: a146a145a3e. > > While it's possible this is a problem with Novell's builds, I don't think that > to be likely, since there are no local patches in the RPM spec file. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker