> -----Original Message----- > From: Attila Megyeri [mailto:amegy...@minerva-soft.com] > Sent: Thursday, March 13, 2014 1:45 PM > To: The Pacemaker cluster resource manager; Andrew Beekhof > Subject: Re: [Pacemaker] Pacemaker/corosync freeze > > Hello, > > > -----Original Message----- > > From: Jan Friesse [mailto:jfrie...@redhat.com] > > Sent: Thursday, March 13, 2014 10:03 AM > > To: The Pacemaker cluster resource manager > > Subject: Re: [Pacemaker] Pacemaker/corosync freeze > > > > ... > > > > >>>> > > >>>> Also can you please try to set debug: on in corosync.conf and > > >>>> paste full corosync.log then? > > >>> > > >>> I set debug to on, and did a few restarts but could not reproduce > > >>> the issue > > >> yet - will post the logs as soon as I manage to reproduce. > > >>> > > >> > > >> Perfect. > > >> > > >> Another option you can try to set is netmtu (1200 is usually safe). > > > > > > Finally I was able to reproduce the issue. > > > I restarted node ctsip2 at 21:10:14, and CPU went 100% immediately > > > (not > > when node was up again). > > > > > > The corosync log with debug on is available at: > > > http://pastebin.com/kTpDqqtm > > > > > > > > > To be honest, I had to wait much longer for this reproduction as > > > before, > > even though there was no change in the corosync configuration - just > > potentially some system updates. But anyway, the issue is > > unfortunately still there. > > > Previously, when this issue came, cpu was at 100% on all nodes - > > > this time > > only on ctmgr, which was the DC... > > > > > > I hope you can find some useful details in the log. > > > > > > > Attila, > > what seems to be interesting is > > > > Configuration ERRORs found during PE processing. Please run "crm_verify - > L" > > to identify issues. > > > > I'm unsure how much is this problem but I'm really not pacemaker expert. > > Perhaps Andrew could comment on that. Any idea? > > > > > > Anyway, I have theory what may happening and it looks like related > > with IPC (and probably not related to network). But to make sure we > > will not try fixing already fixed bug, can you please build: > > - New libqb (0.17.0). There are plenty of fixes in IPC > > - Corosync 2.3.3 (already plenty IPC fixes) > > - And maybe also newer pacemaker > > > > I already use Corosync 2.3.3, built from source, and libqb-dev 0.16 from > Ubuntu package. > I am currently building libqb 0.17.0, will update you on the results. > > In the meantime we had another freeze, which did not seem to be related to > any restarts, but brought all coroync processes to 100%. > Please check out the corosync.log, perhaps it is a different cause: > http://pastebin.com/WMwzv0Rr > > > In the meantime I will install the new libqb and send logs if we have further > issues. > > Thank you very much for your help! > > Regards, > Attila >
One more question: If I install libqb 0.17.0 from source, do I need to rebuild corosync as well, or if it was built with libqb 0.16.0 it will be fine? BTW, in the meantime I installed the new libqb on 3 of the 7 hosts, so I can see if it makes a difference. If I see crashes on the outdated ones, but not on the new ones, we are fine. :) Thanks, Attila > > > > I know you were not very happy using hand-compiled sources, but please > > give them at least a try. > > > > Thanks, > > Honza > > > > > Thanks, > > > Attila > > > > > > > > > > > >> > > >> Regards, > > >> Honza > > >> > > >>> > > >>> There are also a few things that might or might not be related: > > >>> > > >>> 1) Whenever I want to edit the configuration with "crm configure > > >>> edit", > > > > ... > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org Getting started: > > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org