Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-09 Thread Karthikeyan Ramasamy
Hi Ken,
  The script now exits properly with 'exit 0'.  But it still it creates hanging 
processes, as listed below.

root 13405 1  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 13566 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 13623 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 13758 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 13784 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14146 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14167 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14193 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14284 13758  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14381 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14469 14284  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14589 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14837 14381  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14860 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 14977 14589  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 19816 14167  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html
root 19845 19816  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
/tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
/opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
/tmp/ClusterMon_SNMP_10.64.109.36.html

From the above it looks that one crm_mon spawns another crm_mon processes and 
keeps building.

Can you please let us know if there is anything else we have to check or still 
there could be issues with the script?

Thanks,
Karthik.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: 02 நவம்பர் 2015 21:21
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script doesn't 
exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes away, 
then it's in the script somewhere. Mostly you want to make sure the script 
completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com]
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: 

Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-09 Thread Ken Gaillot
On 11/09/2015 07:11 AM, Karthikeyan Ramasamy wrote:
> Hi Ken,
>   The script now exits properly with 'exit 0'.  But it still it creates 
> hanging processes, as listed below.
> 
> root 13405 1  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13566 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13623 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13758 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 13784 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14146 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14167 13623  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14193 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14284 13758  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14381 13784  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14469 14284  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14589 13405  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14837 14381  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14860 13566  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 14977 14589  0 13:42 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 19816 14167  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> root 19845 19816  0 13:43 ?00:00:00 /usr/sbin/crm_mon -p 
> /tmp/ClusterMon_SNMP_10.64.109.36.pid -d -i 15 -E 
> /opt/occ/CXP_902_0588_R13B2370/tools/PCSESA.sh -h 
> /tmp/ClusterMon_SNMP_10.64.109.36.html
> 
> From the above it looks that one crm_mon spawns another crm_mon processes and 
> keeps building.
> 
> Can you please let us know if there is anything else we have to check or 
> still there could be issues with the script?
> 
> Thanks,
> Karthik.

That's odd. The ClusterMon resource should spawn crm_mon only once, when
it starts. Does the cluster report any failures for the ClusterMon resource?

I doubt it is the issue in this case, but ClusterMon resources should
not be cloned or duplicated, because it does not monitor the health of
one node but of the entire cluster.

> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 02 நவம்பர் 2015 21:21
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
>> Yes, Ken.  There were multiple instances of the external script also 
>> running.  What do you think could possibly be wrong with the script that 
>> triggers the crm_mon process everytime?
> 
> It's the other way around, crm_mon spawns 

Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-04 Thread Karthikeyan Ramasamy
Hi Ken,
  Both CPU and this SNMP notification were interrelated, as the number of 
retries was infinite.  As a service was down, for every retry it was trying to 
send a notification as well and that stalled Pacemaker.

Thanks,
Karthik.
-Original Message-
From: Karthikeyan Ramasamy 
Sent: 02 நவம்பர் 2015 21:22
To: 'kgail...@redhat.com'; users@clusterlabs.org
Subject: RE: பதில்: Re: [ClusterLabs] crm_mon memory leak

Many thanks.  Will check and get back.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com]
Sent: 02 நவம்பர் 2015 21:21
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script doesn't 
exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes away, 
then it's in the script somewhere. Mostly you want to make sure the script 
completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com]
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, 
>> or an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat 

Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Karthikeyan Ramasamy
Yes, Ken.  There were multiple instances of the external script also running.  
What do you think could possibly be wrong with the script that triggers the 
crm_mon process everytime?

We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  

Thanks,
Karthik.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: 02 நவம்பர் 2015 21:04
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
> Thanks, Mr.Gaillot.
> 
> Yes, we trigger the snmp notification with an external script.  From your 
> response, I understand that the issue wouldn't occur with 1.1.14, as it 
> wouldn't require the crm_mon process.  Is this understanding correct?

Correct, crm_mon is not spawned with the new method. However if the problem 
originates in the external script, then it still might not work properly, but 
with the new method Pacemaker will kill it after a timeout.

> We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
> ticket to RedHat, would they be able to provide us a patch for 1.1.10?

If you're using RHEL 6.7, you should be able to simply "yum update" to get 
1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
That would give you more bugfixes, which may or may not help your issue.
If you're using an older version, there may not be updates any longer.

If you open a ticket, support can help you isolate where the problem is.

When you saw many crm_mon processes running, did you also see many copies of 
the external script running?

> Many thanks for your response.
> 
> Thanks,
> Karthik.
> 
> 
>  Ken Gaillot எழுதியது 
> 
> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>> Dear Pacemaker support,
>> We are using pacemaker1.1.10-14 to implement a service management 
>> framework, with high availability on the road-map.  This pacemaker 
>> version was available through redhat for our environments
>>
>>   We are running into an issue where pacemaker causes a node to crash.  The 
>> last feature we integrated was SNMP notification.  While listing out the 
>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>> when the node crashed.  When we removed that feature, pacemaker was stable 
>> again.
>>
>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>> for list of known issues, we found this crm_mon memory leak issue.  Although 
>> not related, we think that there is some problem with the crm_mon process.
>>
>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>
>> Can you please let us know if there are issues with SNMP notification in 
>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>> workarounds for this issue if available, would be very helpful for us.  
>> Please help.
>>
>> Thanks,
>> Karthik.
> 
> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or 
> an external script that generates the SNMP trap?
> 
> If you're using the built-in capability, that has to be explicitly 
> enabled when Pacemaker is compiled. Many distributions (including 
> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
> option, you have it enabled, otherwise not.
> 
> If you're using an external script to generate the SNMP trap, please 
> post it (with any sensitive info taken out of course).
> 
> The ClusterMon resource will generate a crm_mon at regular intervals, 
> but it should exit quickly. It sounds like it's not exiting at all, 
> which is why you see this problem.
> 
> If you have a RHEL subscription, you can open a support ticket with 
> Red Hat. Note that stonith must be enabled before Red Hat (and many 
> other
> vendors) will support a cluster. Also, you should be able to "yum 
> update" to a much newer version of Pacemaker to get bugfixes, if 
> you're using RHEL 6 or 7.
> 
> FYI, the latest upstream Pacemaker has a new feature that will be in 
> 1.1.14, allowing it to call an external notification script without 
> needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Karthikeyan Ramasamy
Many thanks.  Will check and get back.

-Original Message-
From: Ken Gaillot [mailto:kgail...@redhat.com] 
Sent: 02 நவம்பர் 2015 21:21
To: Karthikeyan Ramasamy; users@clusterlabs.org
Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak

On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script doesn't 
exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes away, 
then it's in the script somewhere. Mostly you want to make sure the script 
completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com]
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, 
>> or an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat (and many 
>> other
>> vendors) will support a cluster. Also, you should be able to "yum 
>> update" to a much newer version of Pacemaker to get bugfixes, if 
>> you're using RHEL 6 or 7.
>>
>> FYI, the latest upstream Pacemaker has a new feature that will be in 
>> 1.1.14, allowing it to call an external notification script without 
>> needing a ClusterMon resource.
> 

___
Users 

Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 11/02/2015 09:39 AM, Karthikeyan Ramasamy wrote:
> Yes, Ken.  There were multiple instances of the external script also running. 
>  What do you think could possibly be wrong with the script that triggers the 
> crm_mon process everytime?

It's the other way around, crm_mon spawns the script. So if the script
doesn't exit properly, neither will crm_mon.

Easy test: put an "exit" at the top of your script. If the problem goes
away, then it's in the script somewhere. Mostly you want to make sure
the script completes within your monitoring interval.

> We are on RHEL 6.5.  I am not sure what's the plan for RHEL 6.7 and 7.1.  
> 
> Thanks,
> Karthik.
> 
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 02 நவம்பர் 2015 21:04
> To: Karthikeyan Ramasamy; users@clusterlabs.org
> Subject: Re: பதில்: Re: [ClusterLabs] crm_mon memory leak
> 
> On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
>> Thanks, Mr.Gaillot.
>>
>> Yes, we trigger the snmp notification with an external script.  From your 
>> response, I understand that the issue wouldn't occur with 1.1.14, as it 
>> wouldn't require the crm_mon process.  Is this understanding correct?
> 
> Correct, crm_mon is not spawned with the new method. However if the problem 
> originates in the external script, then it still might not work properly, but 
> with the new method Pacemaker will kill it after a timeout.
> 
>> We have been given 1.1.10 as the supported version from RedHat.  If I raise 
>> a ticket to RedHat, would they be able to provide us a patch for 1.1.10?
> 
> If you're using RHEL 6.7, you should be able to simply "yum update" to get 
> 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
> That would give you more bugfixes, which may or may not help your issue.
> If you're using an older version, there may not be updates any longer.
> 
> If you open a ticket, support can help you isolate where the problem is.
> 
> When you saw many crm_mon processes running, did you also see many copies of 
> the external script running?
> 
>> Many thanks for your response.
>>
>> Thanks,
>> Karthik.
>>
>>
>>  Ken Gaillot எழுதியது 
>>
>> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>>> Dear Pacemaker support,
>>> We are using pacemaker1.1.10-14 to implement a service management 
>>> framework, with high availability on the road-map.  This pacemaker 
>>> version was available through redhat for our environments
>>>
>>>   We are running into an issue where pacemaker causes a node to crash.  The 
>>> last feature we integrated was SNMP notification.  While listing out the 
>>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>>> when the node crashed.  When we removed that feature, pacemaker was stable 
>>> again.
>>>
>>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>>> for list of known issues, we found this crm_mon memory leak issue.  
>>> Although not related, we think that there is some problem with the crm_mon 
>>> process.
>>>
>>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>>
>>> Can you please let us know if there are issues with SNMP notification in 
>>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>>> workarounds for this issue if available, would be very helpful for us.  
>>> Please help.
>>>
>>> Thanks,
>>> Karthik.
>>
>> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or 
>> an external script that generates the SNMP trap?
>>
>> If you're using the built-in capability, that has to be explicitly 
>> enabled when Pacemaker is compiled. Many distributions (including 
>> RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" 
>> option, you have it enabled, otherwise not.
>>
>> If you're using an external script to generate the SNMP trap, please 
>> post it (with any sensitive info taken out of course).
>>
>> The ClusterMon resource will generate a crm_mon at regular intervals, 
>> but it should exit quickly. It sounds like it's not exiting at all, 
>> which is why you see this problem.
>>
>> If you have a RHEL subscription, you can open a support ticket with 
>> Red Hat. Note that stonith must be enabled before Red Hat (and many 
>> other
>> vendors) will support a cluster. Also, you should be able to "yum 
>> update" to a much newer version of Pacemaker to get bugfixes, if 
>> you're using RHEL 6 or 7.
>>
>> FYI, the latest upstream Pacemaker has a new feature that will be in 
>> 1.1.14, allowing it to call an external notification script without 
>> needing a ClusterMon resource.
> 


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] பதில்: Re: crm_mon memory leak

2015-11-02 Thread Ken Gaillot
On 10/31/2015 12:38 AM, Karthikeyan Ramasamy wrote:
> Thanks, Mr.Gaillot.
> 
> Yes, we trigger the snmp notification with an external script.  From your 
> response, I understand that the issue wouldn't occur with 1.1.14, as it 
> wouldn't require the crm_mon process.  Is this understanding correct?

Correct, crm_mon is not spawned with the new method. However if the
problem originates in the external script, then it still might not work
properly, but with the new method Pacemaker will kill it after a timeout.

> We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
> ticket to RedHat, would they be able to provide us a patch for 1.1.10?

If you're using RHEL 6.7, you should be able to simply "yum update" to
get 1.1.12. If you're using RHEL 7.1, you should be able to get 1.1.13.
That would give you more bugfixes, which may or may not help your issue.
If you're using an older version, there may not be updates any longer.

If you open a ticket, support can help you isolate where the problem is.

When you saw many crm_mon processes running, did you also see many
copies of the external script running?

> Many thanks for your response.
> 
> Thanks,
> Karthik.
> 
> 
>  Ken Gaillot எழுதியது 
> 
> On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
>> Dear Pacemaker support,
>> We are using pacemaker1.1.10-14 to implement a service management framework, 
>> with high availability on the road-map.  This pacemaker version was 
>> available through redhat for our environments
>>
>>   We are running into an issue where pacemaker causes a node to crash.  The 
>> last feature we integrated was SNMP notification.  While listing out the 
>> processes we found that crm_mon processes occupying 58GB of available 64GB, 
>> when the node crashed.  When we removed that feature, pacemaker was stable 
>> again.
>>
>> Section 7.1 of the pacemaker document details that SNMP notification agent 
>> triggers a crm_mon process at regular intervals.  On checking clusterlabs 
>> for list of known issues, we found this crm_mon memory leak issue.  Although 
>> not related, we think that there is some problem with the crm_mon process.
>>
>> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>>
>> Can you please let us know if there are issues with SNMP notification in 
>> Pacemaker or if there is anything that we could be wrong.  Also, any 
>> workarounds for this issue if available, would be very helpful for us.  
>> Please help.
>>
>> Thanks,
>> Karthik.
> 
> Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
> an external script that generates the SNMP trap?
> 
> If you're using the built-in capability, that has to be explicitly
> enabled when Pacemaker is compiled. Many distributions (including RHEL)
> do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
> have it enabled, otherwise not.
> 
> If you're using an external script to generate the SNMP trap, please
> post it (with any sensitive info taken out of course).
> 
> The ClusterMon resource will generate a crm_mon at regular intervals,
> but it should exit quickly. It sounds like it's not exiting at all,
> which is why you see this problem.
> 
> If you have a RHEL subscription, you can open a support ticket with Red
> Hat. Note that stonith must be enabled before Red Hat (and many other
> vendors) will support a cluster. Also, you should be able to "yum
> update" to a much newer version of Pacemaker to get bugfixes, if you're
> using RHEL 6 or 7.
> 
> FYI, the latest upstream Pacemaker has a new feature that will be in
> 1.1.14, allowing it to call an external notification script without
> needing a ClusterMon resource.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] பதில்: Re: crm_mon memory leak

2015-10-30 Thread Karthikeyan Ramasamy
Thanks, Mr.Gaillot.

Yes, we trigger the snmp notification with an external script.  From your 
response, I understand that the issue wouldn't occur with 1.1.14, as it 
wouldn't require the crm_mon process.  Is this understanding correct?

We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
ticket to RedHat, would they be able to provide us a patch for 1.1.10?

Many thanks for your response.

Thanks,
Karthik.


 Ken Gaillot எழுதியது 

On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
> Dear Pacemaker support,
> We are using pacemaker1.1.10-14 to implement a service management framework, 
> with high availability on the road-map.  This pacemaker version was available 
> through redhat for our environments
>
>   We are running into an issue where pacemaker causes a node to crash.  The 
> last feature we integrated was SNMP notification.  While listing out the 
> processes we found that crm_mon processes occupying 58GB of available 64GB, 
> when the node crashed.  When we removed that feature, pacemaker was stable 
> again.
>
> Section 7.1 of the pacemaker document details that SNMP notification agent 
> triggers a crm_mon process at regular intervals.  On checking clusterlabs for 
> list of known issues, we found this crm_mon memory leak issue.  Although not 
> related, we think that there is some problem with the crm_mon process.
>
> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>
> Can you please let us know if there are issues with SNMP notification in 
> Pacemaker or if there is anything that we could be wrong.  Also, any 
> workarounds for this issue if available, would be very helpful for us.  
> Please help.
>
> Thanks,
> Karthik.

Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
an external script that generates the SNMP trap?

If you're using the built-in capability, that has to be explicitly
enabled when Pacemaker is compiled. Many distributions (including RHEL)
do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
have it enabled, otherwise not.

If you're using an external script to generate the SNMP trap, please
post it (with any sensitive info taken out of course).

The ClusterMon resource will generate a crm_mon at regular intervals,
but it should exit quickly. It sounds like it's not exiting at all,
which is why you see this problem.

If you have a RHEL subscription, you can open a support ticket with Red
Hat. Note that stonith must be enabled before Red Hat (and many other
vendors) will support a cluster. Also, you should be able to "yum
update" to a much newer version of Pacemaker to get bugfixes, if you're
using RHEL 6 or 7.

FYI, the latest upstream Pacemaker has a new feature that will be in
1.1.14, allowing it to call an external notification script without
needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org