Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.
Hi Hideo-san, On Fri, Oct 30, 2015 at 11:41:26AM +0900, renayama19661...@ybb.ne.jp wrote: > Hi Dejan, > Hi All, > > How about the patch which I contributed by a former email? > I would like an opinion. It somehow slipped. I suppose that you tested the patch well and nobody objected so far, so lets apply it. Many thanks! And sorry about the delay. Cheers, Dejan > > Best Regards, > Hideo Yamauchi. > > - Original Message - > > From: "renayama19661...@ybb.ne.jp"> > To: Cluster Labs - All topics related to open-source clustering welcomed > > > > Cc: > > Date: 2015/10/14, Wed 09:38 > > Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a > > lower case of hostlist. > > > > Hi Dejan, > > Hi All, > > > > We reconsidered a patch. > > > > > > > > In Pacemaker1.1, node names in STONITH are always small letters. > > When a user uses a capital letter in host name, STONITH of libvirt fails. > > > > This patch lets STONITH by libvirt succeed in the next setting. > > > > * host name(upper case) and hostlist(upper case) and domain_id on > > libvirt(uppper case) > > * host name(upper case) and hostlist(lower case) and domain_id on > > libvirt(lower > > case) > > * host name(lower case) and hostlist(upper case) and domain_id on > > libvirt(uppper case) > > * host name(lower case) and hostlist(lower case) and domain_id on > > libvirt(lower > > case) > > > > > > However, in the case of the next setting, STONITH of libvirt causes an > > error. > > In this case it is necessary for the user to make the size of the letter of > > the > > host name to manage of libvirt the same as host name to appoint in hostlist. > > > > * host name(upper case) and hostlist(lower case) and domain_id on > > libvirt(uppper case) > > * host name(upper case) and hostlist(uppper case) and domain_id on > > libvirt(lower case) > > * host name(lower case) and hostlist(lower case) and domain_id on > > libvirt(uppper case) > > * host name(lower case) and hostlist(uppper case) and domain_id on > > libvirt(lower case) > > > > > > This patch is effective for letting STONITH by libvirt when host name was > > set > > for a capital letter succeed. > > > > Best Regards, > > Hideo Yamauchi. > > > > > > > > > > - Original Message - > >> From: "renayama19661...@ybb.ne.jp" > > > >> To: Cluster Labs - All topics related to open-source clustering welcomed > > > >> Cc: > >> Date: 2015/9/15, Tue 03:28 > >> Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to > >> a > > lower case of hostlist. > >> > >> Hi Dejan, > >> > >>> I suppose that you'll send another one? I can vaguelly recall > >>> a problem with non-lower case node names, but not the specifics. > >>> Is that supposed to be handled within a stonith agent? > >> > >> > >> Yes. > >> We make a different patch now. > >> With the patch, I solve a node name of the non-small letter in the range > >> of > > > >> stonith agent. > >> # But the patch cannot cover all all patterns. > >> > >> Please wait a little longer. > >> I send a patch again. > >> For a new patch, please tell me your opinion. > >> > >> Best Regards, > >> Hideo Yamauchi. > >> > >> > >> > >> - Original Message - > >>> From: Dejan Muhamedagic > >>> To: ClusterLabs-ML > >>> Cc: > >>> Date: 2015/9/14, Mon 22:20 > >>> Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion > > to a > >> lower case of hostlist. > >>> > >>> Hi Hideo-san, > >>> > >>> On Tue, Sep 08, 2015 at 05:28:05PM +0900, renayama19661...@ybb.ne.jp > > wrote: > Hi All, > > We intend to change some patches. > We withdraw this patch. > >>> > >>> I suppose that you'll send another one? I can vaguelly recall > >>> a problem with non-lower case node names, but not the specifics. > >>> Is that supposed to be handled within a stonith agent? > >>> > >>> Cheers, > >>> > >>> Dejan > >>> > Best Regards, > Hideo Yamauchi. > > > - Original Message - > > From: "renayama19661...@ybb.ne.jp" > >>> > > To: ClusterLabs-ML > > Cc: > > Date: 2015/9/7, Mon 09:06 > > Subject: [ClusterLabs] [Patch][glue][external/libvirt] > > Conversion > >> to a > >>> lower case of hostlist. > > > > Hi All, > > > > When a cluster carries out stonith, Pacemaker handles host > > name > >> by a > >>> small > > letter. > > When a user sets the host name of the OS and host name of > >> hostlist of > > external/libvrit in capital letters and waits, stonith is > > not > >> carried > >>> out. > > > > The
Re: [ClusterLabs] Pacemaker process 10-15% CPU
On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote: > Hello, > We are using Pacemaker to manage the services that run on a node, as part > of a service management framework, and manage the nodes running the services > as a cluster. One service will be running as 1+1 and other services with be > N+1. > > During our testing, we see that the pacemaker processes are taking about > 10-15% of the CPU. We would like to know if this is normal and could the CPU > utilization be minimised. It's definitely not normal to stay that high for very long. If you can attach your configuration and a sample of your logs, we can look for anything that stands out. > Sample Output of most used CPU process in a Active Manager is > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > 189 15766 30.4 0.0 94616 12300 ?Ss 18:01 48:15 > /usr/libexec/pacemaker/cib > 189 15770 28.9 0.0 118320 20276 ?Ss 18:01 45:53 > /usr/libexec/pacemaker/pengine > root 15768 2.6 0.0 76196 3420 ?Ss 18:01 4:12 > /usr/libexec/pacemaker/lrmd > root 15767 15.5 0.0 95380 5764 ?Ss 18:01 24:33 > /usr/libexec/pacemaker/stonithd > > USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND > 189 15766 30.5 0.0 94616 12300 ?Ss 18:01 49:58 > /usr/libexec/pacemaker/cib > 189 15770 29.0 0.0 122484 20724 ?Rs 18:01 47:29 > /usr/libexec/pacemaker/pengine > root 15768 2.6 0.0 76196 3420 ?Ss 18:01 4:21 > /usr/libexec/pacemaker/lrmd > root 15767 15.5 0.0 95380 5764 ?Ss 18:01 25:25 > /usr/libexec/pacemaker/stonithd > > > We also observed that the processes are not distributed equally to all the > available cores and saw that Redhat acknowledging that rhel doesn't > distribute to the available cores efficiently. We are trying to use > IRQbalance to spread the processes to the available cores equally. Pacemaker is single-threaded, so each process runs on only one core. It's up to the OS to distribute them, and any modern Linux (including RHEL) will do a good job of that. IRQBalance is useful for balancing IRQ requests across cores, but it doesn't do anything about processes (and doesn't need to). > Please let us know if there is any way we could minimise the CPU utilisation. > We dont require stonith feature, but there is no way stop that daemon from > running to our knowledge. If that is also possible, please let us know. > > Thanks, > Karthik. The logs will help figure out what's going wrong. A lot of people would disagree that you don't require stonith :) Stonith is necessary to recover from many possible failure scenarios, and without it, you may wind up with data corruption or other problems. Setting stonith-enabled=false will keep pacemaker from using stonith, but stonithd will still run. It shouldn't take up significant resources. The load you're seeing is an indication of a problem somewhere. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] crm_mon memory leak
On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote: > Dear Pacemaker support, > We are using pacemaker1.1.10-14 to implement a service management framework, > with high availability on the road-map. This pacemaker version was available > through redhat for our environments > > We are running into an issue where pacemaker causes a node to crash. The > last feature we integrated was SNMP notification. While listing out the > processes we found that crm_mon processes occupying 58GB of available 64GB, > when the node crashed. When we removed that feature, pacemaker was stable > again. > > Section 7.1 of the pacemaker document details that SNMP notification agent > triggers a crm_mon process at regular intervals. On checking clusterlabs for > list of known issues, we found this crm_mon memory leak issue. Although not > related, we think that there is some problem with the crm_mon process. > > http://clusterlabs.org/pipermail/users/2015-August/001084.html > > Can you please let us know if there are issues with SNMP notification in > Pacemaker or if there is anything that we could be wrong. Also, any > workarounds for this issue if available, would be very helpful for us. > Please help. > > Thanks, > Karthik. Are you using ClusterMon with Pacemaker's built-in SNMP capability, or an external script that generates the SNMP trap? If you're using the built-in capability, that has to be explicitly enabled when Pacemaker is compiled. Many distributions (including RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you have it enabled, otherwise not. If you're using an external script to generate the SNMP trap, please post it (with any sensitive info taken out of course). The ClusterMon resource will generate a crm_mon at regular intervals, but it should exit quickly. It sounds like it's not exiting at all, which is why you see this problem. If you have a RHEL subscription, you can open a support ticket with Red Hat. Note that stonith must be enabled before Red Hat (and many other vendors) will support a cluster. Also, you should be able to "yum update" to a much newer version of Pacemaker to get bugfixes, if you're using RHEL 6 or 7. FYI, the latest upstream Pacemaker has a new feature that will be in 1.1.14, allowing it to call an external notification script without needing a ClusterMon resource. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Resources not starting some times after node reboot
On 10/29/2015 12:42 PM, Pritam Kharat wrote: > Hi All, > > I have single node with 5 resources running on it. When I rebooted node > sometimes I saw resources in stopped state though node comes online. > > When looked in to the logs, one difference found in success and failure > case is, when > *Election Trigger (I_DC_TIMEOUT) just popped (2ms) *occurred LRM did > not start the resources instead jumped to monitor action and then onwards > it did not start the resources at all. > > But in success case this Election timeout did not come and first action > taken by LRM was start the resource and then start monitoring it making all > the resources started properly. > > I have attached both the success and failure logs. Could some one please > explain the reason for this issue and how to solve this ? > > > My CRM configuration is - > > root@sc-node-2:~# crm configure show > node $id="2" sc-node-2 > primitive oc-fw-agent upstart:oc-fw-agent \ > meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \ > op monitor interval="15s" timeout="60s" > primitive oc-lb-agent upstart:oc-lb-agent \ > meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \ > op monitor interval="15s" timeout="60s" > primitive oc-service-manager upstart:oc-service-manager \ > meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \ > op monitor interval="15s" timeout="60s" > primitive oc-vpn-agent upstart:oc-vpn-agent \ > meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \ > op monitor interval="15s" timeout="60s" > primitive sc_vip ocf:heartbeat:IPaddr2 \ > params ip="200.10.10.188" cidr_netmask="24" nic="eth1" \ > op monitor interval="15s" > group sc-resources sc_vip oc-service-manager oc-fw-agent oc-lb-agent > oc-vpn-agent > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="corosync" \ > stonith-enabled="false" \ > cluster-recheck-interval="3min" \ > default-action-timeout="180s" The attached logs don't go far enough to be sure what happened; all they show at that point is that in both cases, the cluster correctly probed all the resources to be sure they weren't already running. The behavior shouldn't be different depending on the election trigger, but it's hard to say for sure from this info. With a single-node cluster, you should also set no-quorum-policy=ignore. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Pacemaker process 10-15% CPU
Hello, We are using Pacemaker to manage the services that run on a node, as part of a service management framework, and manage the nodes running the services as a cluster. One service will be running as 1+1 and other services with be N+1. During our testing, we see that the pacemaker processes are taking about 10-15% of the CPU. We would like to know if this is normal and could the CPU utilization be minimised. Sample Output of most used CPU process in a Active Manager is USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND 189 15766 30.4 0.0 94616 12300 ?Ss 18:01 48:15 /usr/libexec/pacemaker/cib 189 15770 28.9 0.0 118320 20276 ?Ss 18:01 45:53 /usr/libexec/pacemaker/pengine root 15768 2.6 0.0 76196 3420 ?Ss 18:01 4:12 /usr/libexec/pacemaker/lrmd root 15767 15.5 0.0 95380 5764 ?Ss 18:01 24:33 /usr/libexec/pacemaker/stonithd USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND 189 15766 30.5 0.0 94616 12300 ?Ss 18:01 49:58 /usr/libexec/pacemaker/cib 189 15770 29.0 0.0 122484 20724 ?Rs 18:01 47:29 /usr/libexec/pacemaker/pengine root 15768 2.6 0.0 76196 3420 ?Ss 18:01 4:21 /usr/libexec/pacemaker/lrmd root 15767 15.5 0.0 95380 5764 ?Ss 18:01 25:25 /usr/libexec/pacemaker/stonithd We also observed that the processes are not distributed equally to all the available cores and saw that Redhat acknowledging that rhel doesn't distribute to the available cores efficiently. We are trying to use IRQbalance to spread the processes to the available cores equally. Please let us know if there is any way we could minimise the CPU utilisation. We dont require stonith feature, but there is no way stop that daemon from running to our knowledge. If that is also possible, please let us know. Thanks, Karthik. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] பதில்: Re: crm_mon memory leak
Thanks, Mr.Gaillot. Yes, we trigger the snmp notification with an external script. From your response, I understand that the issue wouldn't occur with 1.1.14, as it wouldn't require the crm_mon process. Is this understanding correct? We have been given 1.1.10 as the supported version from RedHat. If I raise a ticket to RedHat, would they be able to provide us a patch for 1.1.10? Many thanks for your response. Thanks, Karthik. Ken Gaillot எழுதியது On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote: > Dear Pacemaker support, > We are using pacemaker1.1.10-14 to implement a service management framework, > with high availability on the road-map. This pacemaker version was available > through redhat for our environments > > We are running into an issue where pacemaker causes a node to crash. The > last feature we integrated was SNMP notification. While listing out the > processes we found that crm_mon processes occupying 58GB of available 64GB, > when the node crashed. When we removed that feature, pacemaker was stable > again. > > Section 7.1 of the pacemaker document details that SNMP notification agent > triggers a crm_mon process at regular intervals. On checking clusterlabs for > list of known issues, we found this crm_mon memory leak issue. Although not > related, we think that there is some problem with the crm_mon process. > > http://clusterlabs.org/pipermail/users/2015-August/001084.html > > Can you please let us know if there are issues with SNMP notification in > Pacemaker or if there is anything that we could be wrong. Also, any > workarounds for this issue if available, would be very helpful for us. > Please help. > > Thanks, > Karthik. Are you using ClusterMon with Pacemaker's built-in SNMP capability, or an external script that generates the SNMP trap? If you're using the built-in capability, that has to be explicitly enabled when Pacemaker is compiled. Many distributions (including RHEL) do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you have it enabled, otherwise not. If you're using an external script to generate the SNMP trap, please post it (with any sensitive info taken out of course). The ClusterMon resource will generate a crm_mon at regular intervals, but it should exit quickly. It sounds like it's not exiting at all, which is why you see this problem. If you have a RHEL subscription, you can open a support ticket with Red Hat. Note that stonith must be enabled before Red Hat (and many other vendors) will support a cluster. Also, you should be able to "yum update" to a much newer version of Pacemaker to get bugfixes, if you're using RHEL 6 or 7. FYI, the latest upstream Pacemaker has a new feature that will be in 1.1.14, allowing it to call an external notification script without needing a ClusterMon resource. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] VIP monitoring failing with Timed Out error
On 29/10/15 14:10 +0100, Jan Pokorný wrote: > On 29/10/15 15:27 +0530, Pritam Kharat wrote: >> When I ran ocf-tester to test IPaddr2 agent >> >> ocf-tester -n sc_vip -o ip=192.168.20.188 -o cidr_netmask=24 -o nic=eth0 >> /usr/lib/ocf/resource.d/heartbeat/IPaddr2 >> >> I got this error - ERROR: Setup problem: couldn't find command: ip >> in test_command monitor. I verified ip command is there but still >> this error. What might be the reason for this error ? Is this okay ? >> >> + ip_validate >> + check_binary ip >> + have_binary ip >> + '[' 1 = 1 ']' >> + false > > It may be the case that you have the environment tainted with > a variable that should only be set in a special testing mode > injecting an error of the particular helper binary missing. > > Can you please try "unset OCF_TESTER_FAIL_HAVE_BINARY" to sanitize > your environment first? Indeed, if you don't have this variable set > for sure in the context of IPAddr2 agent invocations, the problem > is elsewhere. Btw. it might be worth considering whether pacemaker should restrict the environment variables for invocation of the resources, just as systemd does [1], so as to prevent accidental changes in their behavior like with OCF_TESTER_FAIL_HAVE_BINARY vs. IPaddr2. [1] http://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment%20variables%20in%20spawned%20processes -- Jan (Poki) pgpYOCG1swDoV.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.
Hi Dejan, Thank you for a reply. > It somehow slipped. > > I suppose that you tested the patch well and nobody objected so > far, so lets apply it. > > Many thanks! And sorry about the delay. I confirmed the merge of the patch. * http://hg.linux-ha.org/glue/rev/56f40ec5d37e Many Thanks! Hideo Yamauchi. - Original Message - > From: Dejan Muhamedagic> To: Cluster Labs - All topics related to open-source clustering welcomed > > Cc: > Date: 2015/10/30, Fri 16:58 > Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a > lower case of hostlist. > > Hi Hideo-san, > > On Fri, Oct 30, 2015 at 11:41:26AM +0900, renayama19661...@ybb.ne.jp wrote: >> Hi Dejan, >> Hi All, >> >> How about the patch which I contributed by a former email? >> I would like an opinion. > > It somehow slipped. > > I suppose that you tested the patch well and nobody objected so > far, so lets apply it. > > Many thanks! And sorry about the delay. > > Cheers, > > Dejan > > > >> >> Best Regards, >> Hideo Yamauchi. >> >> - Original Message - >> > From: "renayama19661...@ybb.ne.jp" > >> > To: Cluster Labs - All topics related to open-source clustering > welcomed >> > Cc: >> > Date: 2015/10/14, Wed 09:38 >> > Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion > to a lower case of hostlist. >> > >> > Hi Dejan, >> > Hi All, >> > >> > We reconsidered a patch. >> > >> > >> > >> > In Pacemaker1.1, node names in STONITH are always small letters. >> > When a user uses a capital letter in host name, STONITH of libvirt > fails. >> > >> > This patch lets STONITH by libvirt succeed in the next setting. >> > >> > * host name(upper case) and hostlist(upper case) and domain_id on >> > libvirt(uppper case) >> > * host name(upper case) and hostlist(lower case) and domain_id on > libvirt(lower >> > case) >> > * host name(lower case) and hostlist(upper case) and domain_id on >> > libvirt(uppper case) >> > * host name(lower case) and hostlist(lower case) and domain_id on > libvirt(lower >> > case) >> > >> > >> > However, in the case of the next setting, STONITH of libvirt causes an > error. >> > In this case it is necessary for the user to make the size of the > letter of the >> > host name to manage of libvirt the same as host name to appoint in > hostlist. >> > >> > * host name(upper case) and hostlist(lower case) and domain_id on >> > libvirt(uppper case) >> > * host name(upper case) and hostlist(uppper case) and domain_id on >> > libvirt(lower case) >> > * host name(lower case) and hostlist(lower case) and domain_id on >> > libvirt(uppper case) >> > * host name(lower case) and hostlist(uppper case) and domain_id on >> > libvirt(lower case) >> > >> > >> > This patch is effective for letting STONITH by libvirt when host name > was set >> > for a capital letter succeed. >> > >> > Best Regards, >> > Hideo Yamauchi. >> > >> > >> > >> > >> > - Original Message - >> >> From: "renayama19661...@ybb.ne.jp" >> > >> >> To: Cluster Labs - All topics related to open-source clustering > welcomed >> > >> >> Cc: >> >> Date: 2015/9/15, Tue 03:28 >> >> Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] > Conversion to a >> > lower case of hostlist. >> >> >> >> Hi Dejan, >> >> >> >>> I suppose that you'll send another one? I can vaguelly > recall >> >>> a problem with non-lower case node names, but not the > specifics. >> >>> Is that supposed to be handled within a stonith agent? >> >> >> >> >> >> Yes. >> >> We make a different patch now. >> >> With the patch, I solve a node name of the non-small letter in > the range of >> > >> >> stonith agent. >> >> # But the patch cannot cover all all patterns. >> >> >> >> Please wait a little longer. >> >> I send a patch again. >> >> For a new patch, please tell me your opinion. >> >> >> >> Best Regards, >> >> Hideo Yamauchi. >> >> >> >> >> >> >> >> - Original Message - >> >>> From: Dejan Muhamedagic >> >>> To: ClusterLabs-ML >> >>> Cc: >> >>> Date: 2015/9/14, Mon 22:20 >> >>> Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] > Conversion >> > to a >> >> lower case of hostlist. >> >>> >> >>> Hi Hideo-san, >> >>> >> >>> On Tue, Sep 08, 2015 at 05:28:05PM +0900, > renayama19661...@ybb.ne.jp >> > wrote: >> Hi All, >> >> We intend to change some patches. >> We withdraw this patch. >> >>> >> >>> I suppose that you'll send another one? I can vaguelly > recall >> >>> a problem with non-lower case node names, but not the > specifics. >> >>> Is that supposed to be handled within a
[ClusterLabs] crm_mon memory leak
Dear Pacemaker support, We are using pacemaker1.1.10-14 to implement a service management framework, with high availability on the road-map. This pacemaker version was available through redhat for our environments We are running into an issue where pacemaker causes a node to crash. The last feature we integrated was SNMP notification. While listing out the processes we found that crm_mon processes occupying 58GB of available 64GB, when the node crashed. When we removed that feature, pacemaker was stable again. Section 7.1 of the pacemaker document details that SNMP notification agent triggers a crm_mon process at regular intervals. On checking clusterlabs for list of known issues, we found this crm_mon memory leak issue. Although not related, we think that there is some problem with the crm_mon process. http://clusterlabs.org/pipermail/users/2015-August/001084.html Can you please let us know if there are issues with SNMP notification in Pacemaker or if there is anything that we could be wrong. Also, any workarounds for this issue if available, would be very helpful for us. Please help. Thanks, Karthik. ___ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org