Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.

2015-10-30 Thread Dejan Muhamedagic
Hi Hideo-san,

On Fri, Oct 30, 2015 at 11:41:26AM +0900, renayama19661...@ybb.ne.jp wrote:
> Hi Dejan,
> Hi All,
> 
> How about the patch which I contributed by a former email?
> I would like an opinion.

It somehow slipped.

I suppose that you tested the patch well and nobody objected so
far, so lets apply it.

Many thanks! And sorry about the delay.

Cheers,

Dejan



> 
> Best Regards,
> Hideo Yamauchi.
> 
> - Original Message -
> > From: "renayama19661...@ybb.ne.jp" 
> > To: Cluster Labs - All topics related to open-source clustering welcomed 
> > 
> > Cc: 
> > Date: 2015/10/14, Wed 09:38
> > Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a 
> > lower case of hostlist.
> > 
> > Hi Dejan,
> > Hi All,
> > 
> > We reconsidered a patch.
> > 
> > 
> > 
> > In Pacemaker1.1, node names in STONITH are always small letters.
> > When a user uses a capital letter in host name, STONITH of libvirt fails.
> > 
> > This patch lets STONITH by libvirt succeed in the next setting.
> > 
> >  * host name(upper case) and hostlist(upper case) and domain_id on 
> > libvirt(uppper case)
> >  * host name(upper case) and hostlist(lower case) and domain_id on 
> > libvirt(lower 
> > case)
> >  * host name(lower case) and hostlist(upper case) and domain_id on 
> > libvirt(uppper case)
> >  * host name(lower case) and hostlist(lower case) and domain_id on 
> > libvirt(lower 
> > case)
> > 
> > 
> > However, in the case of the next setting, STONITH of libvirt causes an 
> > error.
> > In this case it is necessary for the user to make the size of the letter of 
> > the 
> > host name to manage of libvirt the same as host name to appoint in hostlist.
> > 
> >  * host name(upper case) and hostlist(lower case) and domain_id on 
> > libvirt(uppper case)
> >  * host name(upper case) and hostlist(uppper case) and domain_id on 
> > libvirt(lower case)
> >  * host name(lower case) and hostlist(lower case) and domain_id on 
> > libvirt(uppper case)
> >  * host name(lower case) and hostlist(uppper case) and domain_id on 
> > libvirt(lower case)
> > 
> > 
> > This patch is effective for letting STONITH by libvirt when host name was 
> > set 
> > for a capital letter succeed.
> > 
> > Best Regards,
> > Hideo Yamauchi.
> > 
> > 
> > 
> > 
> > - Original Message -
> >>  From: "renayama19661...@ybb.ne.jp" 
> > 
> >>  To: Cluster Labs - All topics related to open-source clustering welcomed 
> > 
> >>  Cc: 
> >>  Date: 2015/9/15, Tue 03:28
> >>  Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to 
> >> a 
> > lower case of hostlist.
> >> 
> >>  Hi Dejan,
> >> 
> >>>   I suppose that you'll send another one? I can vaguelly recall
> >>>   a problem with non-lower case node names, but not the specifics.
> >>>   Is that supposed to be handled within a stonith agent?
> >> 
> >> 
> >>  Yes.
> >>  We make a different patch now.
> >>  With the patch, I solve a node name of the non-small letter in the range 
> >> of 
> > 
> >>  stonith agent.
> >>  # But the patch cannot cover all all patterns.
> >> 
> >>  Please wait a little longer.
> >>  I send a patch again.
> >>  For a new patch, please tell me your opinion.
> >> 
> >>  Best Regards,
> >>  Hideo Yamauchi.
> >> 
> >> 
> >> 
> >>  - Original Message -
> >>>   From: Dejan Muhamedagic 
> >>>   To: ClusterLabs-ML 
> >>>   Cc: 
> >>>   Date: 2015/9/14, Mon 22:20
> >>>   Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion 
> > to a 
> >>  lower case of hostlist.
> >>> 
> >>>   Hi Hideo-san,
> >>> 
> >>>   On Tue, Sep 08, 2015 at 05:28:05PM +0900, renayama19661...@ybb.ne.jp 
> > wrote:
>     Hi All,
>  
>     We intend to change some patches.
>     We withdraw this patch.
> >>> 
> >>>   I suppose that you'll send another one? I can vaguelly recall
> >>>   a problem with non-lower case node names, but not the specifics.
> >>>   Is that supposed to be handled within a stonith agent?
> >>> 
> >>>   Cheers,
> >>> 
> >>>   Dejan
> >>> 
>     Best Regards,
>     Hideo Yamauchi.
>  
>  
>     - Original Message -
>     > From: "renayama19661...@ybb.ne.jp" 
> >>>   
>     > To: ClusterLabs-ML 
>     > Cc: 
>     > Date: 2015/9/7, Mon 09:06
>     > Subject: [ClusterLabs] [Patch][glue][external/libvirt] 
> > Conversion 
> >>  to a 
> >>>   lower case of hostlist.
>     > 
>     > Hi All,
>     > 
>     > When a cluster carries out stonith, Pacemaker handles host 
> > name 
> >>  by a 
> >>>   small 
>     > letter.
>     > When a user sets the host name of the OS and host name of 
> >>  hostlist of 
>     > external/libvrit in capital letters and waits, stonith is 
> > not 
> >>  carried 
> >>>   out.
>     > 
>     > The 

Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-10-30 Thread Ken Gaillot
On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
> Hello,
>   We are using Pacemaker to manage the services that run on a node, as part 
> of a service management framework, and manage the nodes running the services 
> as a cluster.  One service will be running as 1+1 and other services with be 
> N+1.
> 
>   During our testing, we see that the pacemaker processes are taking about 
> 10-15% of the CPU.  We would like to know if this is normal and could the CPU 
> utilization be minimised.

It's definitely not normal to stay that high for very long. If you can
attach your configuration and a sample of your logs, we can look for
anything that stands out.

> Sample Output of most used CPU process in a Active Manager is
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
> /usr/libexec/pacemaker/cib
> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
> /usr/libexec/pacemaker/stonithd
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
> /usr/libexec/pacemaker/cib
> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
> /usr/libexec/pacemaker/stonithd
> 
> 
> We also observed that the processes are not distributed equally to all the 
> available cores and saw that Redhat acknowledging that rhel doesn't 
> distribute to the available cores efficiently.  We are trying to use 
> IRQbalance to spread the processes to the available cores equally.

Pacemaker is single-threaded, so each process runs on only one core.
It's up to the OS to distribute them, and any modern Linux (including
RHEL) will do a good job of that.

IRQBalance is useful for balancing IRQ requests across cores, but it
doesn't do anything about processes (and doesn't need to).

> Please let us know if there is any way we could minimise the CPU utilisation. 
>  We dont require stonith feature, but there is no way stop that daemon from 
> running to our knowledge.  If that is also possible, please let us know.
> 
> Thanks,
> Karthik.

The logs will help figure out what's going wrong.

A lot of people would disagree that you don't require stonith :) Stonith
is necessary to recover from many possible failure scenarios, and
without it, you may wind up with data corruption or other problems.

Setting stonith-enabled=false will keep pacemaker from using stonith,
but stonithd will still run. It shouldn't take up significant resources.
The load you're seeing is an indication of a problem somewhere.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] crm_mon memory leak

2015-10-30 Thread Ken Gaillot
On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
> Dear Pacemaker support,
> We are using pacemaker1.1.10-14 to implement a service management framework, 
> with high availability on the road-map.  This pacemaker version was available 
> through redhat for our environments
> 
>   We are running into an issue where pacemaker causes a node to crash.  The 
> last feature we integrated was SNMP notification.  While listing out the 
> processes we found that crm_mon processes occupying 58GB of available 64GB, 
> when the node crashed.  When we removed that feature, pacemaker was stable 
> again.
> 
> Section 7.1 of the pacemaker document details that SNMP notification agent 
> triggers a crm_mon process at regular intervals.  On checking clusterlabs for 
> list of known issues, we found this crm_mon memory leak issue.  Although not 
> related, we think that there is some problem with the crm_mon process.
> 
> http://clusterlabs.org/pipermail/users/2015-August/001084.html
> 
> Can you please let us know if there are issues with SNMP notification in 
> Pacemaker or if there is anything that we could be wrong.  Also, any 
> workarounds for this issue if available, would be very helpful for us.  
> Please help.
> 
> Thanks,
> Karthik.

Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
an external script that generates the SNMP trap?

If you're using the built-in capability, that has to be explicitly
enabled when Pacemaker is compiled. Many distributions (including RHEL)
do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
have it enabled, otherwise not.

If you're using an external script to generate the SNMP trap, please
post it (with any sensitive info taken out of course).

The ClusterMon resource will generate a crm_mon at regular intervals,
but it should exit quickly. It sounds like it's not exiting at all,
which is why you see this problem.

If you have a RHEL subscription, you can open a support ticket with Red
Hat. Note that stonith must be enabled before Red Hat (and many other
vendors) will support a cluster. Also, you should be able to "yum
update" to a much newer version of Pacemaker to get bugfixes, if you're
using RHEL 6 or 7.

FYI, the latest upstream Pacemaker has a new feature that will be in
1.1.14, allowing it to call an external notification script without
needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Resources not starting some times after node reboot

2015-10-30 Thread Ken Gaillot
On 10/29/2015 12:42 PM, Pritam Kharat wrote:
> Hi All,
> 
> I have single node with 5 resources running on it. When I rebooted node
> sometimes I saw resources in stopped state though node comes online.
> 
> When looked in to the logs, one difference found in success and failure
> case is, when
> *Election Trigger (I_DC_TIMEOUT) just popped (2ms)  *occurred LRM did
> not start the resources instead jumped to monitor action and then onwards
> it did not start the resources at all.
> 
> But in success case this Election timeout did not come and first action
> taken by LRM was start the resource and then start monitoring it making all
> the resources started properly.
> 
> I have attached both the success and failure logs. Could some one please
> explain the reason for this issue  and how to solve this ?
> 
> 
> My CRM configuration is -
> 
> root@sc-node-2:~# crm configure show
> node $id="2" sc-node-2
> primitive oc-fw-agent upstart:oc-fw-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-lb-agent upstart:oc-lb-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-service-manager upstart:oc-service-manager \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive oc-vpn-agent upstart:oc-vpn-agent \
> meta allow-migrate="true" migration-threshold="5" failure-timeout="120s" \
> op monitor interval="15s" timeout="60s"
> primitive sc_vip ocf:heartbeat:IPaddr2 \
> params ip="200.10.10.188" cidr_netmask="24" nic="eth1" \
> op monitor interval="15s"
> group sc-resources sc_vip oc-service-manager oc-fw-agent oc-lb-agent
> oc-vpn-agent
> property $id="cib-bootstrap-options" \
> dc-version="1.1.10-42f2063" \
> cluster-infrastructure="corosync" \
> stonith-enabled="false" \
> cluster-recheck-interval="3min" \
> default-action-timeout="180s"

The attached logs don't go far enough to be sure what happened; all they
show at that point is that in both cases, the cluster correctly probed
all the resources to be sure they weren't already running.

The behavior shouldn't be different depending on the election trigger,
but it's hard to say for sure from this info.

With a single-node cluster, you should also set no-quorum-policy=ignore.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker process 10-15% CPU

2015-10-30 Thread Karthikeyan Ramasamy
Hello,
  We are using Pacemaker to manage the services that run on a node, as part of 
a service management framework, and manage the nodes running the services as a 
cluster.  One service will be running as 1+1 and other services with be N+1.

  During our testing, we see that the pacemaker processes are taking about 
10-15% of the CPU.  We would like to know if this is normal and could the CPU 
utilization be minimised.

Sample Output of most used CPU process in a Active Manager is

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
/usr/libexec/pacemaker/cib
189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
/usr/libexec/pacemaker/pengine
root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
/usr/libexec/pacemaker/lrmd
root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
/usr/libexec/pacemaker/stonithd

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
/usr/libexec/pacemaker/cib
189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
/usr/libexec/pacemaker/pengine
root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
/usr/libexec/pacemaker/lrmd
root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
/usr/libexec/pacemaker/stonithd


We also observed that the processes are not distributed equally to all the 
available cores and saw that Redhat acknowledging that rhel doesn't distribute 
to the available cores efficiently.  We are trying to use IRQbalance to spread 
the processes to the available cores equally.

Please let us know if there is any way we could minimise the CPU utilisation.  
We dont require stonith feature, but there is no way stop that daemon from 
running to our knowledge.  If that is also possible, please let us know.

Thanks,
Karthik.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] பதில்: Re: crm_mon memory leak

2015-10-30 Thread Karthikeyan Ramasamy
Thanks, Mr.Gaillot.

Yes, we trigger the snmp notification with an external script.  From your 
response, I understand that the issue wouldn't occur with 1.1.14, as it 
wouldn't require the crm_mon process.  Is this understanding correct?

We have been given 1.1.10 as the supported version from RedHat.  If I raise a 
ticket to RedHat, would they be able to provide us a patch for 1.1.10?

Many thanks for your response.

Thanks,
Karthik.


 Ken Gaillot எழுதியது 

On 10/30/2015 05:29 AM, Karthikeyan Ramasamy wrote:
> Dear Pacemaker support,
> We are using pacemaker1.1.10-14 to implement a service management framework, 
> with high availability on the road-map.  This pacemaker version was available 
> through redhat for our environments
>
>   We are running into an issue where pacemaker causes a node to crash.  The 
> last feature we integrated was SNMP notification.  While listing out the 
> processes we found that crm_mon processes occupying 58GB of available 64GB, 
> when the node crashed.  When we removed that feature, pacemaker was stable 
> again.
>
> Section 7.1 of the pacemaker document details that SNMP notification agent 
> triggers a crm_mon process at regular intervals.  On checking clusterlabs for 
> list of known issues, we found this crm_mon memory leak issue.  Although not 
> related, we think that there is some problem with the crm_mon process.
>
> http://clusterlabs.org/pipermail/users/2015-August/001084.html
>
> Can you please let us know if there are issues with SNMP notification in 
> Pacemaker or if there is anything that we could be wrong.  Also, any 
> workarounds for this issue if available, would be very helpful for us.  
> Please help.
>
> Thanks,
> Karthik.

Are you using ClusterMon with Pacemaker's built-in SNMP capability, or
an external script that generates the SNMP trap?

If you're using the built-in capability, that has to be explicitly
enabled when Pacemaker is compiled. Many distributions (including RHEL)
do not enable it. Run "crm_mon --help"; if it shows a "-S" option, you
have it enabled, otherwise not.

If you're using an external script to generate the SNMP trap, please
post it (with any sensitive info taken out of course).

The ClusterMon resource will generate a crm_mon at regular intervals,
but it should exit quickly. It sounds like it's not exiting at all,
which is why you see this problem.

If you have a RHEL subscription, you can open a support ticket with Red
Hat. Note that stonith must be enabled before Red Hat (and many other
vendors) will support a cluster. Also, you should be able to "yum
update" to a much newer version of Pacemaker to get bugfixes, if you're
using RHEL 6 or 7.

FYI, the latest upstream Pacemaker has a new feature that will be in
1.1.14, allowing it to call an external notification script without
needing a ClusterMon resource.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] VIP monitoring failing with Timed Out error

2015-10-30 Thread Jan Pokorný
On 29/10/15 14:10 +0100, Jan Pokorný wrote:
> On 29/10/15 15:27 +0530, Pritam Kharat wrote:
>> When I ran ocf-tester to test IPaddr2 agent
>> 
>> ocf-tester -n sc_vip -o ip=192.168.20.188 -o cidr_netmask=24 -o nic=eth0
>> /usr/lib/ocf/resource.d/heartbeat/IPaddr2
>> 
>> I got this error - ERROR: Setup problem: couldn't find command: ip
>> in test_command monitor.  I verified ip command is there but still
>> this error. What might be the reason for this error ? Is this okay ?
>> 
>> + ip_validate
>> + check_binary ip
>> + have_binary ip
>> + '[' 1 = 1 ']'
>> + false
> 
> It may be the case that you have the environment tainted with
> a variable that should only be set in a special testing mode
> injecting an error of the particular helper binary missing.
> 
> Can you please try "unset OCF_TESTER_FAIL_HAVE_BINARY" to sanitize
> your environment first?  Indeed, if you don't have this variable set
> for sure in the context of IPAddr2 agent invocations, the problem
> is elsewhere.

Btw. it might be worth considering whether pacemaker should restrict
the environment variables for invocation of the resources, just as
systemd does [1], so as to prevent accidental changes in their
behavior like with OCF_TESTER_FAIL_HAVE_BINARY vs. IPaddr2.

[1] 
http://www.freedesktop.org/software/systemd/man/systemd.exec.html#Environment%20variables%20in%20spawned%20processes

-- 
Jan (Poki)


pgpYOCG1swDoV.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a lower case of hostlist.

2015-10-30 Thread renayama19661014
Hi Dejan,

Thank you for a reply.

> It somehow slipped.

> 
> I suppose that you tested the patch well and nobody objected so
> far, so lets apply it.
> 
> Many thanks! And sorry about the delay.


I confirmed the merge of the patch.
 * http://hg.linux-ha.org/glue/rev/56f40ec5d37e

Many Thanks!
Hideo Yamauchi.



- Original Message -
> From: Dejan Muhamedagic 
> To: Cluster Labs - All topics related to open-source clustering welcomed 
> 
> Cc: 
> Date: 2015/10/30, Fri 16:58
> Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion to a 
> lower case of hostlist.
> 
> Hi Hideo-san,
> 
> On Fri, Oct 30, 2015 at 11:41:26AM +0900, renayama19661...@ybb.ne.jp wrote:
>>  Hi Dejan,
>>  Hi All,
>> 
>>  How about the patch which I contributed by a former email?
>>  I would like an opinion.
> 
> It somehow slipped.
> 
> I suppose that you tested the patch well and nobody objected so
> far, so lets apply it.
> 
> Many thanks! And sorry about the delay.
> 
> Cheers,
> 
> Dejan
> 
> 
> 
>> 
>>  Best Regards,
>>  Hideo Yamauchi.
>> 
>>  - Original Message -
>>  > From: "renayama19661...@ybb.ne.jp" 
> 
>>  > To: Cluster Labs - All topics related to open-source clustering 
> welcomed 
>>  > Cc: 
>>  > Date: 2015/10/14, Wed 09:38
>>  > Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] Conversion 
> to a lower case of hostlist.
>>  > 
>>  > Hi Dejan,
>>  > Hi All,
>>  > 
>>  > We reconsidered a patch.
>>  > 
>>  > 
>>  > 
>>  > In Pacemaker1.1, node names in STONITH are always small letters.
>>  > When a user uses a capital letter in host name, STONITH of libvirt 
> fails.
>>  > 
>>  > This patch lets STONITH by libvirt succeed in the next setting.
>>  > 
>>  >  * host name(upper case) and hostlist(upper case) and domain_id on 
>>  > libvirt(uppper case)
>>  >  * host name(upper case) and hostlist(lower case) and domain_id on 
> libvirt(lower 
>>  > case)
>>  >  * host name(lower case) and hostlist(upper case) and domain_id on 
>>  > libvirt(uppper case)
>>  >  * host name(lower case) and hostlist(lower case) and domain_id on 
> libvirt(lower 
>>  > case)
>>  > 
>>  > 
>>  > However, in the case of the next setting, STONITH of libvirt causes an 
> error.
>>  > In this case it is necessary for the user to make the size of the 
> letter of the 
>>  > host name to manage of libvirt the same as host name to appoint in 
> hostlist.
>>  > 
>>  >  * host name(upper case) and hostlist(lower case) and domain_id on 
>>  > libvirt(uppper case)
>>  >  * host name(upper case) and hostlist(uppper case) and domain_id on 
>>  > libvirt(lower case)
>>  >  * host name(lower case) and hostlist(lower case) and domain_id on 
>>  > libvirt(uppper case)
>>  >  * host name(lower case) and hostlist(uppper case) and domain_id on 
>>  > libvirt(lower case)
>>  > 
>>  > 
>>  > This patch is effective for letting STONITH by libvirt when host name 
> was set 
>>  > for a capital letter succeed.
>>  > 
>>  > Best Regards,
>>  > Hideo Yamauchi.
>>  > 
>>  > 
>>  > 
>>  > 
>>  > - Original Message -
>>  >>  From: "renayama19661...@ybb.ne.jp" 
>>  > 
>>  >>  To: Cluster Labs - All topics related to open-source clustering 
> welcomed 
>>  > 
>>  >>  Cc: 
>>  >>  Date: 2015/9/15, Tue 03:28
>>  >>  Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] 
> Conversion to a 
>>  > lower case of hostlist.
>>  >> 
>>  >>  Hi Dejan,
>>  >> 
>>  >>>   I suppose that you'll send another one? I can vaguelly 
> recall
>>  >>>   a problem with non-lower case node names, but not the 
> specifics.
>>  >>>   Is that supposed to be handled within a stonith agent?
>>  >> 
>>  >> 
>>  >>  Yes.
>>  >>  We make a different patch now.
>>  >>  With the patch, I solve a node name of the non-small letter in 
> the range of 
>>  > 
>>  >>  stonith agent.
>>  >>  # But the patch cannot cover all all patterns.
>>  >> 
>>  >>  Please wait a little longer.
>>  >>  I send a patch again.
>>  >>  For a new patch, please tell me your opinion.
>>  >> 
>>  >>  Best Regards,
>>  >>  Hideo Yamauchi.
>>  >> 
>>  >> 
>>  >> 
>>  >>  - Original Message -
>>  >>>   From: Dejan Muhamedagic 
>>  >>>   To: ClusterLabs-ML 
>>  >>>   Cc: 
>>  >>>   Date: 2015/9/14, Mon 22:20
>>  >>>   Subject: Re: [ClusterLabs] [Patch][glue][external/libvirt] 
> Conversion 
>>  > to a 
>>  >>  lower case of hostlist.
>>  >>> 
>>  >>>   Hi Hideo-san,
>>  >>> 
>>  >>>   On Tue, Sep 08, 2015 at 05:28:05PM +0900, 
> renayama19661...@ybb.ne.jp 
>>  > wrote:
>>      Hi All,
>>   
>>      We intend to change some patches.
>>      We withdraw this patch.
>>  >>> 
>>  >>>   I suppose that you'll send another one? I can vaguelly 
> recall
>>  >>>   a problem with non-lower case node names, but not the 
> specifics.
>>  >>>   Is that supposed to be handled within a 

[ClusterLabs] crm_mon memory leak

2015-10-30 Thread Karthikeyan Ramasamy
Dear Pacemaker support,
We are using pacemaker1.1.10-14 to implement a service management framework, 
with high availability on the road-map.  This pacemaker version was available 
through redhat for our environments

  We are running into an issue where pacemaker causes a node to crash.  The 
last feature we integrated was SNMP notification.  While listing out the 
processes we found that crm_mon processes occupying 58GB of available 64GB, 
when the node crashed.  When we removed that feature, pacemaker was stable 
again.

Section 7.1 of the pacemaker document details that SNMP notification agent 
triggers a crm_mon process at regular intervals.  On checking clusterlabs for 
list of known issues, we found this crm_mon memory leak issue.  Although not 
related, we think that there is some problem with the crm_mon process.

http://clusterlabs.org/pipermail/users/2015-August/001084.html

Can you please let us know if there are issues with SNMP notification in 
Pacemaker or if there is anything that we could be wrong.  Also, any 
workarounds for this issue if available, would be very helpful for us.  Please 
help.

Thanks,
Karthik.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org