*hi,
i wants to testing the fail-over capabilities of my cluster. i run pkill -9 corosync on 2nd node and i saw on the 1node that he wants to stonith the node2 but he "giving up after too many failures to fence node" via commandline it works without any problems fence_virsh -a host2 -l root -x -k /root/.ssh/id_rsa -o reboot -v -n zarafa02 **setup 2x kvm guest (zarafa01=node1 / zarafa02=node2) 2x kvm host rhel 6.4 pacemaker,corosync,drbd* * * *hopefully somebody can help me with the issue and the 2nd issue after run the fence_virsh via commandline the pacemaker service isnĀ“t up on the 2nd node. * * node1/var/log/messages Oct 23 09:35:28 zarafa01 pengine[2866]: warning: stage6: Scheduling Node zarafa02for STONITH Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop drbd_mysql:1#011(zarafa02) Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop drbd_zarafa:1#011(zarafa02) Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop apache:1#011(zarafa02) Oct 23 09:35:28 zarafa01 pengine[2866]: notice: LogActions: Stop stonith-zarafa01#011(zarafa02) Oct 23 09:35:28 zarafa01 pengine[2866]: warning: process_pe_message: Calculated Transition 183: (null) Oct 23 09:35:28 zarafa01 crmd[29263]: notice: te_fence_node: Executing reboot fencing operation (124) on zarafa02 (timeout=60000) Oct 23 09:35:28 zarafa01 stonith-ng[2863]: notice: handle_request: Client crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)' Oct 23 09:35:28 zarafa01 stonith-ng[2863]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for zarafa02: 88604a94-8e2e-4ce4-9d08-85559e339f8e (0) Oct 23 09:35:28 zarafa01 crmd[29263]: notice: process_lrm_event: LRM operation drbd_mysql_notify_0 (call=710, rc=0, cib-update=0, confirmed=true) ok Oct 23 09:35:28 zarafa01 crmd[29263]: notice: process_lrm_event: LRM operation drbd_zarafa_notify_0 (call=712, rc=0, cib-update=0, confirmed=true) ok Oct 23 09:36:40 zarafa01 stonith-ng[2863]: error: remote_op_done: Operation reboot of zarafa02 by zarafa01 for crmd.29263@zarafa01.88604a94: Timer expired Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_callback: Stonith operation 5/124:183:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer expired (-62) Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_callback: Stonith operation 5 for zarafa02 failed (Timer expired): aborting transition. Oct 23 09:36:40 zarafa01 crmd[29263]: notice: tengine_stonith_notify: Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for zarafa01: Timer expired (ref=88604a94-8e2e-4ce4-9d08-85559e339f8e) by client crmd.29263 Oct 23 09:36:40 zarafa01 crmd[29263]: notice: run_graph: Transition 183 (Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown): Stopped Oct 23 09:36:40 zarafa01 pengine[2866]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 23 09:36:40 zarafa01 pengine[2866]: warning: pe_fence_node: Node zarafa02 will be fenced because the node is no longer part of the cluster Oct 23 09:36:40 zarafa01 pengine[2866]: warning: determine_online_status: Node zarafa02 is unclean Oct 23 09:37:52 zarafa01 crmd[29263]: notice: tengine_stonith_callback: Stonith operation 6 for zarafa02 failed (Timer expired): aborting transition. Oct 23 09:37:52 zarafa01 crmd[29263]: notice: tengine_stonith_notify: Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for zarafa01: Timer expired (ref=b13b2562-4124-4e6c-acca-e1114f7d9b98) by client crmd.29263 Oct 23 09:37:52 zarafa01 crmd[29263]: notice: run_graph: Transition 184 (Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown): Stopped Oct 23 09:37:52 zarafa01 pengine[2866]: notice: unpack_config: On loss of CCM Quorum: Ignore Oct 23 09:37:52 zarafa01 pengine[2866]: warning: pe_fence_node: Node zarafa02 will be fenced because the node is no longer part of the cluster Oct 23 09:37:52 zarafa01 pengine[2866]: warning: determine_online_status: Node zarafa02 is unclean Oct 23 09:39:04 zarafa01 pengine[2866]: warning: determine_online_status: Node zarafa02 is unclean Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_mysql:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action drbd_zarafa:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action apache:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action apache:1_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:39:04 zarafa01 pengine[2866]: warning: custom_action: Action stonith-zarafa01_stop_0 on zarafa02 is unrunnable (offline) Oct 23 09:43:52 zarafa01 pengine[2866]: notice: LogActions: Stop apache:1#011(zarafa02) Oct 23 09:43:52 zarafa01 pengine[2866]: notice: LogActions: Stop stonith-zarafa01#011(zarafa02) Oct 23 09:43:52 zarafa01 crmd[29263]: notice: te_fence_node: Executing reboot fencing operation (124) on zarafa02 (timeout=60000) Oct 23 09:43:52 zarafa01 pengine[2866]: warning: process_pe_message: Calculated Transition 190: (null) Oct 23 09:43:52 zarafa01 stonith-ng[2863]: notice: handle_request: Client crmd.29263.8f8f06d0 wants to fence (reboot) 'zarafa02' with device '(any)' Oct 23 09:43:52 zarafa01 stonith-ng[2863]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for zarafa02: de24f595-81e3-49f5-8886-07c8c1b22ec7 (0) Oct 23 09:43:52 zarafa01 crmd[29263]: notice: process_lrm_event: LRM operation drbd_mysql_notify_0 (call=752, rc=0, cib-update=0, confirmed=true) ok Oct 23 09:43:52 zarafa01 crmd[29263]: notice: process_lrm_event: LRM operation drbd_zarafa_notify_0 (call=754, rc=0, cib-update=0, confirmed=true) ok Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock lost 92458 messages from pid 1927 due to rate-limiting Oct 23 09:44:04 zarafa01 rsyslogd-2177: imuxsock begins to drop messages from pid 1927 due to rate-limiting Oct 23 09:45:02 zarafa01 rsyslogd-2177: imuxsock lost 13836 messages from pid 1927 due to rate-limiting Oct 23 09:45:03 zarafa01 rsyslogd-2177: imuxsock begins to drop messages from pid 1927 due to rate-limiting Oct 23 09:45:04 zarafa01 stonith-ng[2863]: error: remote_op_done: Operation reboot of zarafa02 by zarafa01 for crmd.29263@zarafa01.de24f595: Timer expired Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_callback: Stonith operation 12/124:190:0:cf74ef64-3995-414e-8ebd-ebacc89ace85: Timer expired (-62) Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_callback: Stonith operation 12 for zarafa02 failed (Timer expired): aborting transition. Oct 23 09:45:04 zarafa01 crmd[29263]: notice: tengine_stonith_notify: Peer zarafa02 was not terminated (st_notify_fence) by zarafa01 for zarafa01: Timer expired (ref=de24f595-81e3-49f5-8886-07c8c1b22ec7) by client crmd.29263 Oct 23 09:45:04 zarafa01 crmd[29263]: notice: run_graph: Transition 190 (Complete=9, Pending=0, Fired=0, Skipped=9, Incomplete=11, Source=unknown): Stopped Oct 23 09:45:04 zarafa01 crmd[29263]: notice: too_many_st_failures: Too many failures to fence zarafa02 (11), giving up Oct 23 09:45:08 zarafa01 rsyslogd-2177: imuxsock lost 178501 messages from pid 1927 due to rate-limiting node zarafa01\ attributes standby="off" node zarafa02 \ attributes standby="off" primitive apache ocf:heartbeat:apache \ params configfile="/etc/httpd/conf/httpd.conf" \ op monitor interval="60s" \ op start interval="0" timeout="40s" \ op stop interval="0" timeout="60s" primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource="mysql" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="59s" role="Master" timeout="30s" \ op monitor interval="60s" role="Slave" timeout="30s" primitive drbd_zarafa ocf:linbit:drbd \ params drbd_resource="zarafa" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="240" \ op monitor interval="59s" role="Master" timeout="30s" \ op monitor interval="60s" role="Slave" timeout="30s" primitive mysql_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/data/mysql" fstype="ext4" options="noatime" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="30s" timeout="40s" primitive mysql_ip ocf:heartbeat:IPaddr2 \ params ip="0.0.0.0" iflabel="MYSQL" cidr_netmask="20" nic="eth0" \ op monitor interval="30s" primitive mysqld lsb:mysqld \ op monitor interval="10" timeout="30" \ op start interval="0" timeout="500" \ op stop interval="0" timeout="500" primitive stonith-zarafa01 stonith:fence_virsh \ params pcmk_host_list="zarafa01" pcmk_host_check="static-list" action="reboot" ipaddr="host01" secure="true" login="root" identity_file="/root/.ssh/id_rsa" \ op monitor interval="300s" \ op start interval="0" timeout="60s" \ meta failure-timeout="180s" primitive stonith-zarafa02 stonith:fence_virsh \ params pcmk_host_list="zarafa02" pcmk_host_check="static-list" action="reboot" ipaddr="host02" secure="true" delay="5" login="root" identity_file="/root/.ssh/id_rsa" \ op monitor interval="300s" \ op start interval="0" timeout="60s" \ meta failure-timeout="180s" primitive zarafa-dagent lsb:zarafa-dagent \ op monitor interval="30" timeout="30" \ meta target-role="Started" primitive zarafa-gateway lsb:zarafa-gateway \ op monitor interval="30" timeout="30" primitive zarafa-ical lsb:zarafa-ical \ op monitor interval="30" timeout="30" primitive zarafa-indexer lsb:zarafa-indexer \ op monitor interval="60" timeout="60" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" primitive zarafa-licensed lsb:zarafa-licensed \ op monitor interval="30" timeout="30" primitive zarafa-monitor lsb:zarafa-monitor \ op monitor interval="30" timeout="30" primitive zarafa-server lsb:zarafa-server \ op monitor interval="30" timeout="90" \ meta target-role="Started" primitive zarafa-spooler lsb:zarafa-spooler \ op monitor interval="30" timeout="30" primitive zarafa_fs ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/data/zarafa" fstype="ext4" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" \ op monitor interval="30s" timeout="40s" \ meta target-role="Started" primitive zarafa_ip ocf:heartbeat:IPaddr2 \ params ip="0.0.0.1" iflabel="ZARAFA" cidr_netmask="20" nic="eth0" \ op monitor interval="30s" \ meta target-role="Started" group mysql mysql_fs mysql_ip mysqld \ meta target-role="Started" group zarafa zarafa_fs zarafa_ip zarafa-server zarafa-spooler zarafa-dagent zarafa-licensed zarafa-monitor zarafa-gateway zarafa-ical zarafa-indexer \ meta target-role="Started" ms ms_drbd_mysql drbd_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" ms ms_drbd_zarafa drbd_zarafa \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" clone apache_clone apache location cli-prefer-mysql mysql \ rule $id="cli-prefer-rule-mysql" inf: #uname eq zarafa01 location drbd-fence-by-handler-mysql-ms_drbd_mysql ms_drbd_mysql \ rule $id="drbd-fence-by-handler-mysql-rule-ms_drbd_mysql" $role="Master" -inf: #uname ne zarafa01 location drbd-fence-by-handler-zarafa-ms_drbd_zarafa ms_drbd_zarafa \ rule $id="drbd-fence-by-handler-zarafa-rule-ms_drbd_zarafa" $role="Master" -inf: #uname ne zarafa01 location preferred_on_mysql mysql 100: zarafa01 location preferred_on_zarafa zarafa 100: zarafa01 location stonith-by-zarafa01 stonith-zarafa02 -inf: zarafa02 location stonith-by-zarafa02 stonith-zarafa01 -inf: zarafa01 colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master colocation zarafa_on_drbd inf: zarafa ms_drbd_zarafa:Master order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start order zarafa_after_drbd inf: ms_drbd_zarafa:promote zarafa:start order zarafa_after_mysql inf: mysql:start zarafa:start property $id="cib-bootstrap-options" \ dc-version="1.1.8-7.el6-394e906" \ cluster-infrastructure="classic openais (with plugin)" \ expected-quorum-votes="2" \ stonith-enabled="true" \ cluster-recheck-interval="5min" \ no-quorum-policy="ignore" \ last-lrm-refresh="1382443560" \ maintenance-mode="off" rsc_defaults $id="rsc-options" \ resource-stickiness="200" \ failure-timeout="10min" \ migration-threshold="3" crm status Last updated: Wed Oct 23 10:51:51 2013 Last change: Wed Oct 23 10:12:17 2013 via cibadmin on zarafa01 Stack: classic openais (with plugin) Current DC: zarafa01 - partition with quorum Version: 1.1.8-7.el6-394e906 2 Nodes configured, 2 expected votes 21 Resources configured. Online: [ zarafa01 zarafa02] Resource Group: mysql mysql_fs (ocf::heartbeat:Filesystem): Started zarafa01 mysql_ip (ocf::heartbeat:IPaddr2): Started zarafa01 mysqld (lsb:mysqld): Started zarafa01 Master/Slave Set: ms_drbd_mysql [drbd_mysql] Masters: [ zarafa01 ] Stopped: [ drbd_mysql:1 ] Resource Group: zarafa zarafa_fs (ocf::heartbeat:Filesystem): Started zarafa01 zarafa_ip (ocf::heartbeat:IPaddr2): Started zarafa01 zarafa-server (lsb:zarafa-server): Started zarafa01 zarafa-spooler (lsb:zarafa-spooler): Started zarafa01 zarafa-dagent (lsb:zarafa-dagent): Started zarafa01 zarafa-licensed (lsb:zarafa-licensed): Started zarafa01 zarafa-monitor (lsb:zarafa-monitor): Started zarafa01 zarafa-gateway (lsb:zarafa-gateway): Started zarafa01 zarafa-ical (lsb:zarafa-ical): Started zarafa01 zarafa-indexer (lsb:zarafa-indexer): Started zarafa01 Master/Slave Set: ms_drbd_zarafa [drbd_zarafa] Masters: [ zarafa01 ] Stopped: [ drbd_zarafa:1 ] Clone Set: apache_clone [apache] Started: [ zarafa01 ] Stopped: [ apache:1 ] stonith-zarafa02 (stonith:fence_virsh): Started zarafa01 * *thanks beo * * * ** * *
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org