Stefan,

Hello everyone!

I am using Pacemaker (1.1.12), Corosync (2.3.0) and libqb (0.16.0) in 2-node 
clusters (virtualized in VMware infrastructure, OS: RHEL 6.7).
I noticed that if only one node is present, the CPU usage of Corosync (as seen 
with top) is slowly but steadily increasing (over days; in my setting about 1% 
per day). The node is basically idle, some Pacemaker managed resources are 
running but they are not contacted by any clients.
I upgraded a test stand-alone node to Corosync (2.4.2) and libqb (1.0.1) (which 
at least made the memleak go away), but the CPU usage is still increasing on 
the node.
When I add a second node to the cluster, the CPU load drops back down to a 
normal (low) CPU usage.
I haven't witnessed the increasing CPU load yet if two nodes were present in a 
cluster.

Even if running Pacemaker/Corosync as a massive-overkill-Monit-replacement is 
questionable, the observed CPU-load is not what I expect to happen.

What could be the reason for this CPU-load increase? Is there a rational behind 
this?

This is really interesting observation. I can talk about corosync and I must say no, there is no rationale behind. It simply shouldn't be happening. Also I don't see any reason why connection of other node(s) could help to remove CPU-load.

Is this a config thing or something in the binaries?

For sure not in corosync. Also your config file looks just ok.

Could you test single ring only and udpu if behavior stays same?

Regards,
  Honza


BR, Stefan

My corosync.conf:

# Please read the corosync.conf.5 manual page
compatibility: whitetank

aisexec {
         user:root
         group:root
}

totem {
         version: 2

         # Security configuration
         secauth: on
         threads: 0

         # Timeout for token
         token: 1000
         token_retransmits_before_loss_const: 4

         # Number of messages that may be sent by one processor on receipt of 
the token
         max_messages: 20

         # How long to wait for join messages in the membership protocol (ms)
         join: 50
         consensus: 1200

         # Turn off the virtual synchrony filter
         vsftype: none

         # Stagger sending the node join messages by 1..send_join ms
         send_join: 50

         # Limit generated nodeids to 31-bits (positive signed integers)
         clear_node_high_bit: yes

         # Interface configuration
         rrp_mode: passive
         interface {
                 ringnumber: 0
                 bindnetaddr: 10.20.30.0
                 mcastaddr: 226.95.30.100
                 mcastport: 5510
         }
         interface {
                 ringnumber: 1
                 bindnetaddr: 10.20.31.0
                 mcastaddr: 226.95.31.100
                 mcastport: 5510
         }
}

logging {
         fileline: off
         to_stderr: no
         to_logfile: no
         to_syslog: yes
         syslog_facility: local3
         debug: off
}

amf {
         mode: disabled
}

quorum {
         provider: corosync_votequorum
         expected_votes: 1
}

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to