Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-11-02 Thread Ken Gaillot
On 11/01/2015 03:43 AM, Karthikeyan Ramasamy wrote:
> Thanks, Ken.
> 
> I understand about stonith.  We are introducing pacemaker for an existing 
> product not for a new product.  Currently, client-side is responsible for 
> load-balancing.  
> 
> High-availability for our product is the next step.  Now, we are introducing 
> it to manage the services and a single point of control for managing the 
> services.  Once the customers get used to this, we will introduce 
> high-availability.
> 
> About the logs, can you please let me know the symptoms that I need to look 
> for?

I'd look for anything "unusual", but that's hard to describe and nearly
impossible if you're not familiar with what's "usual". I'd look for
something repeating over and over in a short time (1 or 2 seconds).

Can you give a general idea of the cluster environment? How many
resources, what cluster options are set, whether configuration changes
are being made frequently, whether failures are common, whether the
network is reliable with low latency, etc.

You might try attaching to one of the busy processes with strace and see
if it's stuck in some sort of loop.

> Thanks,
> Karthik.
> -Original Message-
> From: Ken Gaillot [mailto:kgail...@redhat.com] 
> Sent: 31 அக்டோபர் 2015 03:33
> To: users@clusterlabs.org
> Subject: Re: [ClusterLabs] Pacemaker process 10-15% CPU
> 
> On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
>> Hello,
>>   We are using Pacemaker to manage the services that run on a node, as part 
>> of a service management framework, and manage the nodes running the services 
>> as a cluster.  One service will be running as 1+1 and other services with be 
>> N+1.
>>
>>   During our testing, we see that the pacemaker processes are taking about 
>> 10-15% of the CPU.  We would like to know if this is normal and could the 
>> CPU utilization be minimised.
> 
> It's definitely not normal to stay that high for very long. If you can attach 
> your configuration and a sample of your logs, we can look for anything that 
> stands out.
> 
>> Sample Output of most used CPU process in a Active Manager is
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
>> /usr/libexec/pacemaker/cib
>> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
>> /usr/libexec/pacemaker/stonithd
>>
>> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
>> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
>> /usr/libexec/pacemaker/cib
>> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
>> /usr/libexec/pacemaker/pengine
>> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
>> /usr/libexec/pacemaker/lrmd
>> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
>> /usr/libexec/pacemaker/stonithd
>>
>>
>> We also observed that the processes are not distributed equally to all the 
>> available cores and saw that Redhat acknowledging that rhel doesn't 
>> distribute to the available cores efficiently.  We are trying to use 
>> IRQbalance to spread the processes to the available cores equally.
> 
> Pacemaker is single-threaded, so each process runs on only one core.
> It's up to the OS to distribute them, and any modern Linux (including
> RHEL) will do a good job of that.
> 
> IRQBalance is useful for balancing IRQ requests across cores, but it doesn't 
> do anything about processes (and doesn't need to).
> 
>> Please let us know if there is any way we could minimise the CPU 
>> utilisation.  We dont require stonith feature, but there is no way stop that 
>> daemon from running to our knowledge.  If that is also possible, please let 
>> us know.
>>
>> Thanks,
>> Karthik.
> 
> The logs will help figure out what's going wrong.
> 
> A lot of people would disagree that you don't require stonith :) Stonith is 
> necessary to recover from many possible failure scenarios, and without it, 
> you may wind up with data corruption or other problems.
> 
> Setting stonith-enabled=false will keep pacemaker from using stonith, but 
> stonithd will still run. It shouldn't take up significant resources.
> The load you're seeing is an indication of a problem somewhere.


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker process 10-15% CPU

2015-10-30 Thread Ken Gaillot
On 10/30/2015 05:14 AM, Karthikeyan Ramasamy wrote:
> Hello,
>   We are using Pacemaker to manage the services that run on a node, as part 
> of a service management framework, and manage the nodes running the services 
> as a cluster.  One service will be running as 1+1 and other services with be 
> N+1.
> 
>   During our testing, we see that the pacemaker processes are taking about 
> 10-15% of the CPU.  We would like to know if this is normal and could the CPU 
> utilization be minimised.

It's definitely not normal to stay that high for very long. If you can
attach your configuration and a sample of your logs, we can look for
anything that stands out.

> Sample Output of most used CPU process in a Active Manager is
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
> /usr/libexec/pacemaker/cib
> 189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
> /usr/libexec/pacemaker/stonithd
> 
> USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
> 189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
> /usr/libexec/pacemaker/cib
> 189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
> /usr/libexec/pacemaker/pengine
> root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
> /usr/libexec/pacemaker/lrmd
> root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
> /usr/libexec/pacemaker/stonithd
> 
> 
> We also observed that the processes are not distributed equally to all the 
> available cores and saw that Redhat acknowledging that rhel doesn't 
> distribute to the available cores efficiently.  We are trying to use 
> IRQbalance to spread the processes to the available cores equally.

Pacemaker is single-threaded, so each process runs on only one core.
It's up to the OS to distribute them, and any modern Linux (including
RHEL) will do a good job of that.

IRQBalance is useful for balancing IRQ requests across cores, but it
doesn't do anything about processes (and doesn't need to).

> Please let us know if there is any way we could minimise the CPU utilisation. 
>  We dont require stonith feature, but there is no way stop that daemon from 
> running to our knowledge.  If that is also possible, please let us know.
> 
> Thanks,
> Karthik.

The logs will help figure out what's going wrong.

A lot of people would disagree that you don't require stonith :) Stonith
is necessary to recover from many possible failure scenarios, and
without it, you may wind up with data corruption or other problems.

Setting stonith-enabled=false will keep pacemaker from using stonith,
but stonithd will still run. It shouldn't take up significant resources.
The load you're seeing is an indication of a problem somewhere.

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Pacemaker process 10-15% CPU

2015-10-30 Thread Karthikeyan Ramasamy
Hello,
  We are using Pacemaker to manage the services that run on a node, as part of 
a service management framework, and manage the nodes running the services as a 
cluster.  One service will be running as 1+1 and other services with be N+1.

  During our testing, we see that the pacemaker processes are taking about 
10-15% of the CPU.  We would like to know if this is normal and could the CPU 
utilization be minimised.

Sample Output of most used CPU process in a Active Manager is

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
189  15766 30.4  0.0  94616 12300 ?Ss   18:01  48:15 
/usr/libexec/pacemaker/cib
189  15770 28.9  0.0 118320 20276 ?Ss   18:01  45:53 
/usr/libexec/pacemaker/pengine
root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:12 
/usr/libexec/pacemaker/lrmd
root 15767 15.5  0.0  95380  5764 ?Ss   18:01  24:33 
/usr/libexec/pacemaker/stonithd

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
189  15766 30.5  0.0  94616 12300 ?Ss   18:01  49:58 
/usr/libexec/pacemaker/cib
189  15770 29.0  0.0 122484 20724 ?Rs   18:01  47:29 
/usr/libexec/pacemaker/pengine
root 15768  2.6  0.0  76196  3420 ?Ss   18:01   4:21 
/usr/libexec/pacemaker/lrmd
root 15767 15.5  0.0  95380  5764 ?Ss   18:01  25:25 
/usr/libexec/pacemaker/stonithd


We also observed that the processes are not distributed equally to all the 
available cores and saw that Redhat acknowledging that rhel doesn't distribute 
to the available cores efficiently.  We are trying to use IRQbalance to spread 
the processes to the available cores equally.

Please let us know if there is any way we could minimise the CPU utilisation.  
We dont require stonith feature, but there is no way stop that daemon from 
running to our knowledge.  If that is also possible, please let us know.

Thanks,
Karthik.
___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org