pacemaker version? it looks familiar but it depends on the version number.
> On 29 Dec 2014, at 10:24 pm, Sergey Arlashin <sergeyarl.maill...@gmail.com> > wrote: > > Hi! > Recently I've noticed that one of my nodes had OFFLINE status in 'crm status' > output. But it actually was not. I could ssh on this node. I could get 'crm > status' from that node's console. After some time it became online. It > happened several times without any obvious reason with other nodes. > > Still no error of fatal messages in logs. The only warning messages I could > get from corosync.log were the following: > > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1346 > -> 0.233.1347 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1347 > -> 0.233.1348 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1348 > -> 0.233.1349 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1349 > -> 0.233.1350 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1350 > -> 0.233.1351 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1351 > -> 0.233.1352 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1352 > -> 0.233.1353 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1353 > -> 0.233.1354 not applied to 0.233.1354: current "num_updates" is greater > than required > Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 491 > for last-failure-Cachier=1419729443 failed: Application of an update diff > failed > Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 494 > for fail-count-Cachier=1 failed: Application of an update diff failed > Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 497 > for probe_complete=true failed: Application of an update diff failed > Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 500 > for last-failure-Cachier=1419729443 failed: Application of an update diff > failed > Dec 29 10:56:34 lb-node2 attrd: [2240]: WARN: attrd_cib_callback: Update 503 > for fail-count-Cachier=1 failed: Application of an update diff failed > Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1338 > -> 0.233.1339 not applied to 0.233.1382: current "num_updates" is greater > than required > Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1339 > -> 0.233.1340 not applied to 0.233.1382: current "num_updates" is greater > than required > Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1340 > -> 0.233.1341 not applied to 0.233.1382: current "num_updates" is greater > than required > Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1341 > -> 0.233.1342 not applied to 0.233.1382: current "num_updates" is greater > than required > Dec 29 10:56:37 lb-node2 cib: [2238]: WARN: cib_process_diff: Diff 0.233.1342 > -> 0.233.1343 not applied to 0.233.1382: current "num_updates" is greater > than required > > After exploring corosync processes with ps I found out that on all my nodes > there are zombie corosync procs like: > > root 13892 0.0 0.0 0 0 ? Z Dec26 0:04 [corosync] > <defunct> > root 21793 0.0 0.0 0 0 ? Z Dec26 0:00 [corosync] > <defunct> > root 27009 1.3 1.0 714292 10784 ? Ssl Dec18 223:38 > /usr/sbin/corosync > > Is it ok to have zombie corosync procs on nodes? Or does it suggest that > something wrong is going on ? > > Thanks in advance > > -- > Best regards, > Sergey Arlashin > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org