hi, I've built a recent cluster stack from sources on Debian Jessie and I can't get rid of cpu spikes. Corosync blocks the entire system for seconds on every simple transition, even itself:
drbdtest1 corosync[4734]: [MAIN ] Corosync main process was not scheduled for 2590.4512 ms (threshold is 2400.0000 ms). Consider token timeout increase. and even drbd: drbdtest1 kernel: drbd p1: PingAck did not arrive in time. My previous build (corosync 1.4.6, libqb 0.17.0, pacemaker 1.1.12) works fine on this nodes with the same corosync/pacemaker setup. What should I try? It's a test environment, the issue is 100% reproducible in seconds. Network traffic is minimal all the time and there is no I/O load. *Pacemaker config:* node 167969573: drbdtest1 node 167969574: drbdtest2 primitive drbd_p1 ocf:linbit:drbd \ params drbd_resource=p1 \ op monitor interval=30 primitive drbd_p2 ocf:linbit:drbd \ params drbd_resource=p2 \ op monitor interval=30 primitive dummy_test ocf:pacemaker:Dummy \ meta allow-migrate=true \ params state="/var/run/activenode" primitive fence_libvirt stonith:external/libvirt \ params hostlist="drbdtest1,drbdtest2" hypervisor_uri="qemu+ssh://libvirt-fencing@mgx4/system" \ op monitor interval=30 primitive fs_boot Filesystem \ params device="/dev/null" directory="/boot" fstype="*" \ meta is-managed=false \ op monitor interval=20 timeout=40 on-fail=block OCF_CHECK_LEVEL=20 primitive fs_f1 Filesystem \ params device="/dev/drbd/by-res/p1" directory="/mnt/p1" fstype=ext4 options="commit=60,barrier=0,data=writeback" \ op monitor interval=20 timeout=40 \ op start timeout=300 interval=0 \ op stop timeout=180 interval=0 primitive ip_10.3.3.138 IPaddr2 \ params ip=10.3.3.138 cidr_netmask=32 \ op monitor interval=10s timeout=20s primitive sysinfo ocf:pacemaker:SysInfo \ op start timeout=20s interval=0 \ op stop timeout=20s interval=0 \ op monitor interval=60s group dummy-group dummy_test ms ms_drbd_p1 drbd_p1 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true ms ms_drbd_p2 drbd_p2 \ meta master-max=2 master-node-max=1 clone-max=2 notify=true clone fencing_by_libvirt fence_libvirt \ meta globally-unique=false clone fs_boot_clone fs_boot clone sysinfos sysinfo \ meta globally-unique=false location fs1_on_high_load fs_f1 \ rule -inf: cpu_load gte 4 colocation dummy_coloc inf: dummy-group ms_drbd_p2:Master colocation f1a-coloc inf: fs_f1 ms_drbd_p1:Master colocation f1b-coloc inf: fs_f1 fs_boot_clone:Started order dummy_order inf: ms_drbd_p2:promote dummy-group:start order orderA inf: ms_drbd_p1:promote fs_f1:start property cib-bootstrap-options: \ dc-version=1.1.13-6052cd1 \ cluster-infrastructure=corosync \ expected-quorum-votes=2 \ no-quorum-policy=ignore \ symmetric-cluster=true \ placement-strategy=default \ last-lrm-refresh=1438735742 \ have-watchdog=false property cib-bootstrap-options-stonith: \ stonith-enabled=true \ stonith-action=reboot rsc_defaults rsc-options: \ resource-stickiness=100 *corosync.conf:* totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 clear_node_high_bit: yes crypto_cipher: none crypto_hash: none interface { ringnumber: 0 bindnetaddr: 10.3.3.37 mcastaddr: 225.0.0.37 mcastport: 5403 ttl: 1 } } logging { fileline: off to_stderr: no to_logfile: yes logfile: /var/log/corosync/corosync.log to_syslog: yes syslog_facility: daemon debug: off timestamp: on logger_subsys { subsys: QUORUM debug: off } } quorum { provider: corosync_votequorum expected_votes: 2 }
_______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org