Re: [ClusterLabs] Memory leak in crm_mon ?
> On 16 Aug 2015, at 9:41 pm, Attila Megyeri wrote: > > Hi Andrew, > > I managed to isolate / reproduce the issue. You might want to take a look, as > it might be present in 1.1.12 as well. > > I monitor my cluster from putty, mainly this way: > - I have a putty (Windows client) session, that connects via SSH to the box, > authenticates using public key as a non-root user. > - It immediately sends a "sudo crm_mon -Af" command, so with a single click I > have a nice view of what the cluster is doing. Perhaps add -1 to the option list. The root cause seems to be that closing the putty window doesn’t actually kill the process running inside it. > > Whenever I close this putty window (terminate the app), crm_mon process gets > to 100% cpu usage, starts to leak, in a few hours consumes all memory and > then destroys the whole cluster. > This does not happen if I leave crm_mon with Ctrl-C. > > I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu > trusty packages. > This might be related on how sudo executes crm_mon, and what it signalls to > crm_mon when it gets terminated. > > Now I know what I need to pay attention to in order to avoid this problem, > but you might want to check whether this issue is still present. > > > Thanks, > Attila > > > > > > > -Original Message- > From: Attila Megyeri [mailto:amegy...@minerva-soft.com] > Sent: Friday, August 14, 2015 12:40 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] Memory leak in crm_mon ? > > > > -Original Message- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Tuesday, August 11, 2015 2:49 AM > To: Cluster Labs - All topics related to open-source clustering welcomed > > Subject: Re: [ClusterLabs] Memory leak in crm_mon ? > > >> On 10 Aug 2015, at 5:33 pm, Attila Megyeri wrote: >> >> Hi! >> >> We are building a new cluster on top of pacemaker/corosync and several times >> during the past days we noticed that „crm_mon -Af” used up all the >> memory+swap and caused high CPU usage. Killing the process solves the issue. >> >> We are using the binary package versions available in the latest ubuntu >> trusty, namely: >> >> crmsh 1.2.5+hg1034-1ubuntu4 >> >> pacemaker >> 1.1.10+git20130802-1ubuntu2.3 >> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 >> corosync 2.3.3-1ubuntu1 >> >> Kernel is 3.13.0-46-generic >> >> Looking back some „atop” data, the CPU went to 100% many times during the >> last couple of days, at various times, more often around midnight exaclty >> (strange). >> >> 08.05 14:00 >> 08.06 21:41 >> 08.07 00:00 >> 08.07 00:00 >> 08.08 00:00 >> 08.09 06:27 >> >> Checked the corosync log and syslog, but did not find any correlation >> between the entries int he logs around the specific times. >> For most of the time, the node running the crm_mon was the DC as well – not >> running any resources (e.g. a pairless node for quorum). >> >> >> We have another running system, where everything works perfecly, whereas it >> is almost the same: >> >> crmsh 1.2.5+hg1034-1ubuntu4 >> >> pacemaker >> 1.1.10+git20130802-1ubuntu2.1 >> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 >> corosync 2.3.3-1ubuntu1 >> >> Kernel is 3.13.0-8-generic >> >> >> Is this perhaps a known issue? > > Possibly, that version is over 2 years old. > >> Any hints? > > Getting something a little more recent would be the best place to start > > Thanks Andew, > > I tried to upgrade to 1.1.12 using the packages availabe at > https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a > single node, to see how it works out but I ended up with errors like > > Could not establish cib_rw connection: Connection refused (111) > > I have disabled the firewall, no changes. The node appears to be running but > does not see any of the other nodes. On the other nodes I see this node as an > UNCLEAN one
Re: [ClusterLabs] Memory leak in crm_mon ?
Hi Andrew, I managed to isolate / reproduce the issue. You might want to take a look, as it might be present in 1.1.12 as well. I monitor my cluster from putty, mainly this way: - I have a putty (Windows client) session, that connects via SSH to the box, authenticates using public key as a non-root user. - It immediately sends a "sudo crm_mon -Af" command, so with a single click I have a nice view of what the cluster is doing. Whenever I close this putty window (terminate the app), crm_mon process gets to 100% cpu usage, starts to leak, in a few hours consumes all memory and then destroys the whole cluster. This does not happen if I leave crm_mon with Ctrl-C. I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu trusty packages. This might be related on how sudo executes crm_mon, and what it signalls to crm_mon when it gets terminated. Now I know what I need to pay attention to in order to avoid this problem, but you might want to check whether this issue is still present. Thanks, Attila -Original Message- From: Attila Megyeri [mailto:amegy...@minerva-soft.com] Sent: Friday, August 14, 2015 12:40 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Memory leak in crm_mon ? -Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Tuesday, August 11, 2015 2:49 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Memory leak in crm_mon ? > On 10 Aug 2015, at 5:33 pm, Attila Megyeri wrote: > > Hi! > > We are building a new cluster on top of pacemaker/corosync and several times > during the past days we noticed that „crm_mon -Af” used up all the > memory+swap and caused high CPU usage. Killing the process solves the issue. > > We are using the binary package versions available in the latest ubuntu > trusty, namely: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.3 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-46-generic > > Looking back some „atop” data, the CPU went to 100% many times during the > last couple of days, at various times, more often around midnight exaclty > (strange). > > 08.05 14:00 > 08.06 21:41 > 08.07 00:00 > 08.07 00:00 > 08.08 00:00 > 08.09 06:27 > > Checked the corosync log and syslog, but did not find any correlation between > the entries int he logs around the specific times. > For most of the time, the node running the crm_mon was the DC as well – not > running any resources (e.g. a pairless node for quorum). > > > We have another running system, where everything works perfecly, whereas it > is almost the same: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.1 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-8-generic > > > Is this perhaps a known issue? Possibly, that version is over 2 years old. > Any hints? Getting something a little more recent would be the best place to start Thanks Andew, I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like Could not establish cib_rw connection: Connection refused (111) I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not) I use udpu for the transport. Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated http://clusterlabs.org/wiki/Upgrade Could you please direct me to some howto/guide on how to perform the upgrade? Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) ) Thanks a lot in advance > > Thanks! > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://
Re: [ClusterLabs] Memory leak in crm_mon ?
-Original Message- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Tuesday, August 11, 2015 2:49 AM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Memory leak in crm_mon ? > On 10 Aug 2015, at 5:33 pm, Attila Megyeri wrote: > > Hi! > > We are building a new cluster on top of pacemaker/corosync and several times > during the past days we noticed that „crm_mon -Af” used up all the > memory+swap and caused high CPU usage. Killing the process solves the issue. > > We are using the binary package versions available in the latest ubuntu > trusty, namely: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.3 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-46-generic > > Looking back some „atop” data, the CPU went to 100% many times during the > last couple of days, at various times, more often around midnight exaclty > (strange). > > 08.05 14:00 > 08.06 21:41 > 08.07 00:00 > 08.07 00:00 > 08.08 00:00 > 08.09 06:27 > > Checked the corosync log and syslog, but did not find any correlation between > the entries int he logs around the specific times. > For most of the time, the node running the crm_mon was the DC as well – not > running any resources (e.g. a pairless node for quorum). > > > We have another running system, where everything works perfecly, whereas it > is almost the same: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.1 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-8-generic > > > Is this perhaps a known issue? Possibly, that version is over 2 years old. > Any hints? Getting something a little more recent would be the best place to start Thanks Andew, I tried to upgrade to 1.1.12 using the packages availabe at https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a single node, to see how it works out but I ended up with errors like Could not establish cib_rw connection: Connection refused (111) I have disabled the firewall, no changes. The node appears to be running but does not see any of the other nodes. On the other nodes I see this node as an UNCLEAN one. (I assume corosync is fine, but pacemaker not) I use udpu for the transport. Am I doing something wrong? I tried to look for some howtos on upgrade, but the only thing I found was the rather outdated http://clusterlabs.org/wiki/Upgrade Could you please direct me to some howto/guide on how to perform the upgrade? Or am I facing some compatibility issue, so I should extract the whole cib, upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is meant to go live in 2 days... :) ) Thanks a lot in advance > > Thanks! > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Memory leak in crm_mon ?
> On 10 Aug 2015, at 5:33 pm, Attila Megyeri wrote: > > Hi! > > We are building a new cluster on top of pacemaker/corosync and several times > during the past days we noticed that „crm_mon -Af” used up all the > memory+swap and caused high CPU usage. Killing the process solves the issue. > > We are using the binary package versions available in the latest ubuntu > trusty, namely: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.3 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-46-generic > > Looking back some „atop” data, the CPU went to 100% many times during the > last couple of days, at various times, more often around midnight exaclty > (strange). > > 08.05 14:00 > 08.06 21:41 > 08.07 00:00 > 08.07 00:00 > 08.08 00:00 > 08.09 06:27 > > Checked the corosync log and syslog, but did not find any correlation between > the entries int he logs around the specific times. > For most of the time, the node running the crm_mon was the DC as well – not > running any resources (e.g. a pairless node for quorum). > > > We have another running system, where everything works perfecly, whereas it > is almost the same: > > crmsh 1.2.5+hg1034-1ubuntu4 > > pacemaker > 1.1.10+git20130802-1ubuntu2.1 > pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 > corosync 2.3.3-1ubuntu1 > > Kernel is 3.13.0-8-generic > > > Is this perhaps a known issue? Possibly, that version is over 2 years old. > Any hints? Getting something a little more recent would be the best place to start > > Thanks! > ___ > Users mailing list: Users@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Memory leak in crm_mon ?
Hi! We are building a new cluster on top of pacemaker/corosync and several times during the past days we noticed that "crm_mon -Af" used up all the memory+swap and caused high CPU usage. Killing the process solves the issue. We are using the binary package versions available in the latest ubuntu trusty, namely: crmsh 1.2.5+hg1034-1ubuntu4 pacemaker1.1.10+git20130802-1ubuntu2.3 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3 corosync 2.3.3-1ubuntu1 Kernel is 3.13.0-46-generic Looking back some "atop" data, the CPU went to 100% many times during the last couple of days, at various times, more often around midnight exaclty (strange). 08.05 14:00 08.06 21:41 08.07 00:00 08.07 00:00 08.08 00:00 08.09 06:27 Checked the corosync log and syslog, but did not find any correlation between the entries int he logs around the specific times. For most of the time, the node running the crm_mon was the DC as well - not running any resources (e.g. a pairless node for quorum). We have another running system, where everything works perfecly, whereas it is almost the same: crmsh 1.2.5+hg1034-1ubuntu4 pacemaker1.1.10+git20130802-1ubuntu2.1 pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 corosync 2.3.3-1ubuntu1 Kernel is 3.13.0-8-generic Is this perhaps a known issue? Any hints? Thanks! ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org