Re: [ClusterLabs] Memory leak in crm_mon ?

2015-08-16 Thread Andrew Beekhof

> On 16 Aug 2015, at 9:41 pm, Attila Megyeri  wrote:
> 
> Hi Andrew,
> 
> I managed to isolate / reproduce the issue. You might want to take a look, as 
> it might be present in 1.1.12 as well.
> 
> I monitor my cluster from putty, mainly this way:
> - I have a putty (Windows client) session, that connects via SSH to the box, 
> authenticates using public key as a non-root user.
> - It immediately sends a "sudo crm_mon -Af" command, so with a single click I 
> have a nice view of what the cluster is doing.

Perhaps add -1 to the option list.
The root cause seems to be that closing the putty window doesn’t actually kill 
the process running inside it.

> 
> Whenever I close this putty window (terminate the app), crm_mon process gets 
> to 100% cpu usage, starts to leak, in a few hours consumes all memory and 
> then destroys the whole cluster.
> This does not happen if I leave crm_mon with Ctrl-C.
> 
> I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu 
> trusty packages.
> This might be related on how sudo executes crm_mon, and what it signalls to 
> crm_mon when it gets terminated.
> 
> Now I know what I need to pay attention to in order to avoid this problem, 
> but you might want to check whether this issue is still present.
> 
> 
> Thanks,
> Attila 
> 
> 
> 
> 
> 
> 
> -Original Message-
> From: Attila Megyeri [mailto:amegy...@minerva-soft.com] 
> Sent: Friday, August 14, 2015 12:40 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> 
> 
> 
> -Original Message-
> From: Andrew Beekhof [mailto:and...@beekhof.net] 
> Sent: Tuesday, August 11, 2015 2:49 AM
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Subject: Re: [ClusterLabs] Memory leak in crm_mon ?
> 
> 
>> On 10 Aug 2015, at 5:33 pm, Attila Megyeri  wrote:
>> 
>> Hi!
>> 
>> We are building a new cluster on top of pacemaker/corosync and several times 
>> during the past days we noticed that „crm_mon -Af” used up all the 
>> memory+swap and caused high CPU usage. Killing the process solves the issue.
>> 
>> We are using the binary package versions available in the latest ubuntu 
>> trusty, namely:
>> 
>> crmsh  1.2.5+hg1034-1ubuntu4 
>> 
>> pacemaker
>> 1.1.10+git20130802-1ubuntu2.3  
>> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3  
>> corosync 2.3.3-1ubuntu1   
>> 
>> Kernel is 3.13.0-46-generic
>> 
>> Looking back some „atop” data, the CPU went to 100% many times during the 
>> last couple of days, at various times, more often around midnight exaclty 
>> (strange).
>> 
>> 08.05 14:00
>> 08.06 21:41
>> 08.07 00:00
>> 08.07 00:00
>> 08.08 00:00
>> 08.09 06:27
>> 
>> Checked the corosync log and syslog, but did not find any correlation 
>> between the entries int he logs around the specific times.
>> For most of the time, the node running the crm_mon was the DC as well – not 
>> running any resources (e.g. a pairless node for quorum).
>> 
>> 
>> We have another running system, where everything works perfecly, whereas it 
>> is almost the same:
>> 
>> crmsh  1.2.5+hg1034-1ubuntu4 
>>  
>> pacemaker
>> 1.1.10+git20130802-1ubuntu2.1 
>> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
>> corosync 2.3.3-1ubuntu1  
>> 
>> Kernel is 3.13.0-8-generic
>> 
>> 
>> Is this perhaps a known issue?
> 
> Possibly, that version is over 2 years old.
> 
>> Any hints?
> 
> Getting something a little more recent would be the best place to start
> 
> Thanks Andew,
> 
> I tried to upgrade to 1.1.12 using the packages availabe at 
> https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a 
> single node, to see how it works out but I ended up with errors like
> 
> Could not establish cib_rw connection: Connection refused (111)
> 
> I have disabled the firewall, no changes. The node appears to be running but 
> does not see any of the other nodes. On the other nodes I see this node as an 
> UNCLEAN one

Re: [ClusterLabs] Memory leak in crm_mon ?

2015-08-16 Thread Attila Megyeri
Hi Andrew,

I managed to isolate / reproduce the issue. You might want to take a look, as 
it might be present in 1.1.12 as well.

I monitor my cluster from putty, mainly this way:
- I have a putty (Windows client) session, that connects via SSH to the box, 
authenticates using public key as a non-root user.
- It immediately sends a "sudo crm_mon -Af" command, so with a single click I 
have a nice view of what the cluster is doing.

Whenever I close this putty window (terminate the app), crm_mon process gets to 
100% cpu usage, starts to leak, in a few hours consumes all memory and then 
destroys the whole cluster.
This does not happen if I leave crm_mon with Ctrl-C.

I can reproduce this 100% with crm_mon 1.1.10, with the mainstream ubuntu 
trusty packages.
This might be related on how sudo executes crm_mon, and what it signalls to 
crm_mon when it gets terminated.

Now I know what I need to pay attention to in order to avoid this problem, but 
you might want to check whether this issue is still present.


Thanks,
Attila 






-Original Message-
From: Attila Megyeri [mailto:amegy...@minerva-soft.com] 
Sent: Friday, August 14, 2015 12:40 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] Memory leak in crm_mon ?



-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net] 
Sent: Tuesday, August 11, 2015 2:49 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] Memory leak in crm_mon ?


> On 10 Aug 2015, at 5:33 pm, Attila Megyeri  wrote:
> 
> Hi!
>  
> We are building a new cluster on top of pacemaker/corosync and several times 
> during the past days we noticed that „crm_mon -Af” used up all the 
> memory+swap and caused high CPU usage. Killing the process solves the issue.
>  
> We are using the binary package versions available in the latest ubuntu 
> trusty, namely:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
>
> pacemaker
> 1.1.10+git20130802-1ubuntu2.3  
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3  
> corosync 2.3.3-1ubuntu1   
>  
> Kernel is 3.13.0-46-generic
>  
> Looking back some „atop” data, the CPU went to 100% many times during the 
> last couple of days, at various times, more often around midnight exaclty 
> (strange).
>  
> 08.05 14:00
> 08.06 21:41
> 08.07 00:00
> 08.07 00:00
> 08.08 00:00
> 08.09 06:27
>  
> Checked the corosync log and syslog, but did not find any correlation between 
> the entries int he logs around the specific times.
> For most of the time, the node running the crm_mon was the DC as well – not 
> running any resources (e.g. a pairless node for quorum).
>  
>  
> We have another running system, where everything works perfecly, whereas it 
> is almost the same:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
> 
> pacemaker
> 1.1.10+git20130802-1ubuntu2.1 
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
> corosync 2.3.3-1ubuntu1  
>  
> Kernel is 3.13.0-8-generic
>  
>  
> Is this perhaps a known issue?

Possibly, that version is over 2 years old.

> Any hints?

Getting something a little more recent would be the best place to start

Thanks Andew,

I tried to upgrade to 1.1.12 using the packages availabe at 
https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a 
single node, to see how it works out but I ended up with errors like

Could not establish cib_rw connection: Connection refused (111)

I have disabled the firewall, no changes. The node appears to be running but 
does not see any of the other nodes. On the other nodes I see this node as an 
UNCLEAN one. (I assume corosync is fine, but pacemaker not)
I use udpu for the transport.

Am I doing something wrong? I tried to look for some howtos on upgrade, but the 
only thing I found was the rather outdated   http://clusterlabs.org/wiki/Upgrade

Could you please direct me to some howto/guide on how to perform the upgrade?

Or am I facing some compatibility issue, so I should extract the whole cib, 
upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is 
meant to go live in 2 days... :) )

Thanks a lot in advance




>  
> Thanks!
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://

Re: [ClusterLabs] Memory leak in crm_mon ?

2015-08-13 Thread Attila Megyeri


-Original Message-
From: Andrew Beekhof [mailto:and...@beekhof.net] 
Sent: Tuesday, August 11, 2015 2:49 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 

Subject: Re: [ClusterLabs] Memory leak in crm_mon ?


> On 10 Aug 2015, at 5:33 pm, Attila Megyeri  wrote:
> 
> Hi!
>  
> We are building a new cluster on top of pacemaker/corosync and several times 
> during the past days we noticed that „crm_mon -Af” used up all the 
> memory+swap and caused high CPU usage. Killing the process solves the issue.
>  
> We are using the binary package versions available in the latest ubuntu 
> trusty, namely:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
>
> pacemaker
> 1.1.10+git20130802-1ubuntu2.3  
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3  
> corosync 2.3.3-1ubuntu1   
>  
> Kernel is 3.13.0-46-generic
>  
> Looking back some „atop” data, the CPU went to 100% many times during the 
> last couple of days, at various times, more often around midnight exaclty 
> (strange).
>  
> 08.05 14:00
> 08.06 21:41
> 08.07 00:00
> 08.07 00:00
> 08.08 00:00
> 08.09 06:27
>  
> Checked the corosync log and syslog, but did not find any correlation between 
> the entries int he logs around the specific times.
> For most of the time, the node running the crm_mon was the DC as well – not 
> running any resources (e.g. a pairless node for quorum).
>  
>  
> We have another running system, where everything works perfecly, whereas it 
> is almost the same:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
> 
> pacemaker
> 1.1.10+git20130802-1ubuntu2.1 
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
> corosync 2.3.3-1ubuntu1  
>  
> Kernel is 3.13.0-8-generic
>  
>  
> Is this perhaps a known issue?

Possibly, that version is over 2 years old.

> Any hints?

Getting something a little more recent would be the best place to start

Thanks Andew,

I tried to upgrade to 1.1.12 using the packages availabe at 
https://launchpad.net/~syseleven-platform . Int he first attept I upgraded a 
single node, to see how it works out but I ended up with errors like

Could not establish cib_rw connection: Connection refused (111)

I have disabled the firewall, no changes. The node appears to be running but 
does not see any of the other nodes. On the other nodes I see this node as an 
UNCLEAN one. (I assume corosync is fine, but pacemaker not)
I use udpu for the transport.

Am I doing something wrong? I tried to look for some howtos on upgrade, but the 
only thing I found was the rather outdated   http://clusterlabs.org/wiki/Upgrade

Could you please direct me to some howto/guide on how to perform the upgrade?

Or am I facing some compatibility issue, so I should extract the whole cib, 
upgrade all nodes and reconfigure the cluster from the scratch? (The cluster is 
meant to go live in 2 days... :) )

Thanks a lot in advance




>  
> Thanks!
> ___
> Users mailing list: Users@clusterlabs.org 
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org Getting started: 
> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org 
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Memory leak in crm_mon ?

2015-08-10 Thread Andrew Beekhof

> On 10 Aug 2015, at 5:33 pm, Attila Megyeri  wrote:
> 
> Hi!
>  
> We are building a new cluster on top of pacemaker/corosync and several times 
> during the past days we noticed that „crm_mon -Af” used up all the 
> memory+swap and caused high CPU usage. Killing the process solves the issue.
>  
> We are using the binary package versions available in the latest ubuntu 
> trusty, namely:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
>
> pacemaker
> 1.1.10+git20130802-1ubuntu2.3  
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3  
> corosync 2.3.3-1ubuntu1   
>  
> Kernel is 3.13.0-46-generic
>  
> Looking back some „atop” data, the CPU went to 100% many times during the 
> last couple of days, at various times, more often around midnight exaclty 
> (strange).
>  
> 08.05 14:00
> 08.06 21:41
> 08.07 00:00
> 08.07 00:00
> 08.08 00:00
> 08.09 06:27
>  
> Checked the corosync log and syslog, but did not find any correlation between 
> the entries int he logs around the specific times.
> For most of the time, the node running the crm_mon was the DC as well – not 
> running any resources (e.g. a pairless node for quorum).
>  
>  
> We have another running system, where everything works perfecly, whereas it 
> is almost the same:
>  
> crmsh  1.2.5+hg1034-1ubuntu4  
> 
> pacemaker
> 1.1.10+git20130802-1ubuntu2.1 
> pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1 
> corosync 2.3.3-1ubuntu1  
>  
> Kernel is 3.13.0-8-generic
>  
>  
> Is this perhaps a known issue?

Possibly, that version is over 2 years old.

> Any hints?

Getting something a little more recent would be the best place to start

>  
> Thanks!
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Memory leak in crm_mon ?

2015-08-10 Thread Attila Megyeri
Hi!

We are building a new cluster on top of pacemaker/corosync and several times 
during the past days we noticed that "crm_mon -Af" used up all the memory+swap 
and caused high CPU usage. Killing the process solves the issue.

We are using the binary package versions available in the latest ubuntu trusty, 
namely:

crmsh  1.2.5+hg1034-1ubuntu4
pacemaker1.1.10+git20130802-1ubuntu2.3
pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.3
corosync 2.3.3-1ubuntu1

Kernel is 3.13.0-46-generic

Looking back some "atop" data, the CPU went to 100% many times during the last 
couple of days, at various times, more often around midnight exaclty (strange).

08.05 14:00
08.06 21:41
08.07 00:00
08.07 00:00
08.08 00:00
08.09 06:27

Checked the corosync log and syslog, but did not find any correlation between 
the entries int he logs around the specific times.
For most of the time, the node running the crm_mon was the DC as well - not 
running any resources (e.g. a pairless node for quorum).


We have another running system, where everything works perfecly, whereas it is 
almost the same:

crmsh  1.2.5+hg1034-1ubuntu4
pacemaker1.1.10+git20130802-1ubuntu2.1
pacemaker-cli-utils1.1.10+git20130802-1ubuntu2.1
corosync 2.3.3-1ubuntu1

Kernel is 3.13.0-8-generic


Is this perhaps a known issue? Any hints?

Thanks!
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org