Re: [ClusterLabs] Corosync CPU load slowly increasing if one node present

2017-04-27 Thread Jan Friesse

Stefan,


Hello everyone!

I am using Pacemaker (1.1.12), Corosync (2.3.0) and libqb (0.16.0) in 2-node 
clusters (virtualized in VMware infrastructure, OS: RHEL 6.7).
I noticed that if only one node is present, the CPU usage of Corosync (as seen 
with top) is slowly but steadily increasing (over days; in my setting about 1% 
per day). The node is basically idle, some Pacemaker managed resources are 
running but they are not contacted by any clients.
I upgraded a test stand-alone node to Corosync (2.4.2) and libqb (1.0.1) (which 
at least made the memleak go away), but the CPU usage is still increasing on 
the node.
When I add a second node to the cluster, the CPU load drops back down to a 
normal (low) CPU usage.
I haven't witnessed the increasing CPU load yet if two nodes were present in a 
cluster.

Even if running Pacemaker/Corosync as a massive-overkill-Monit-replacement is 
questionable, the observed CPU-load is not what I expect to happen.

What could be the reason for this CPU-load increase? Is there a rational behind 
this?


This is really interesting observation. I can talk about corosync and I 
must say no, there is no rationale behind. It simply shouldn't be 
happening. Also I don't see any reason why connection of other node(s) 
could help to remove CPU-load.



Is this a config thing or something in the binaries?


For sure not in corosync. Also your config file looks just ok.

Could you test single ring only and udpu if behavior stays same?

Regards,
  Honza



BR, Stefan

My corosync.conf:

# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
 user:root
 group:root
}

totem {
 version: 2

 # Security configuration
 secauth: on
 threads: 0

 # Timeout for token
 token: 1000
 token_retransmits_before_loss_const: 4

 # Number of messages that may be sent by one processor on receipt of 
the token
 max_messages: 20

 # How long to wait for join messages in the membership protocol (ms)
 join: 50
 consensus: 1200

 # Turn off the virtual synchrony filter
 vsftype: none

 # Stagger sending the node join messages by 1..send_join ms
 send_join: 50

 # Limit generated nodeids to 31-bits (positive signed integers)
 clear_node_high_bit: yes

 # Interface configuration
 rrp_mode: passive
 interface {
 ringnumber: 0
 bindnetaddr: 10.20.30.0
 mcastaddr: 226.95.30.100
 mcastport: 5510
 }
 interface {
 ringnumber: 1
 bindnetaddr: 10.20.31.0
 mcastaddr: 226.95.31.100
 mcastport: 5510
 }
}

logging {
 fileline: off
 to_stderr: no
 to_logfile: no
 to_syslog: yes
 syslog_facility: local3
 debug: off
}

amf {
 mode: disabled
}

quorum {
 provider: corosync_votequorum
 expected_votes: 1
}

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org




___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] Corosync CPU load slowly increasing if one node present

2017-04-27 Thread Stefan Kohlhauser
Hello everyone!

I am using Pacemaker (1.1.12), Corosync (2.3.0) and libqb (0.16.0) in 2-node 
clusters (virtualized in VMware infrastructure, OS: RHEL 6.7).
I noticed that if only one node is present, the CPU usage of Corosync (as seen 
with top) is slowly but steadily increasing (over days; in my setting about 1% 
per day). The node is basically idle, some Pacemaker managed resources are 
running but they are not contacted by any clients.
I upgraded a test stand-alone node to Corosync (2.4.2) and libqb (1.0.1) (which 
at least made the memleak go away), but the CPU usage is still increasing on 
the node.
When I add a second node to the cluster, the CPU load drops back down to a 
normal (low) CPU usage.
I haven't witnessed the increasing CPU load yet if two nodes were present in a 
cluster.

Even if running Pacemaker/Corosync as a massive-overkill-Monit-replacement is 
questionable, the observed CPU-load is not what I expect to happen.

What could be the reason for this CPU-load increase? Is there a rational behind 
this?
Is this a config thing or something in the binaries?

BR, Stefan

My corosync.conf:

# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
user:root
group:root
}

totem {
version: 2

# Security configuration
secauth: on
threads: 0

# Timeout for token
token: 1000
token_retransmits_before_loss_const: 4

# Number of messages that may be sent by one processor on receipt of 
the token
max_messages: 20

# How long to wait for join messages in the membership protocol (ms)
join: 50
consensus: 1200

# Turn off the virtual synchrony filter
vsftype: none

# Stagger sending the node join messages by 1..send_join ms
send_join: 50

# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes

# Interface configuration
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.20.30.0
mcastaddr: 226.95.30.100
mcastport: 5510
}
interface {
ringnumber: 1
bindnetaddr: 10.20.31.0
mcastaddr: 226.95.31.100
mcastport: 5510
}
}

logging {
fileline: off
to_stderr: no
to_logfile: no
to_syslog: yes
syslog_facility: local3
debug: off
}

amf {
mode: disabled
}

quorum {
provider: corosync_votequorum
expected_votes: 1
}

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org