Hi, I am having this issue where it appears that everything is working correctly, but when I simulate failure the failover fails to work correctly. the Migrate command works fine, I can transfer the service, and the error I get when a node is put into standby or a server goes down is
Any help would be greatly appreciated Brian Cavanagh PS Disregard if this double posted Working fine . ============ Last updated: Fri Jan 28 12:17:24 2011 Stack: Heartbeat Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 1 expected votes 2 Resources configured. ============ Online: [ mdb4 mdb3 ] Master/Slave Set: ms_drbd_mysql Masters: [ mdb4 ] Slaves: [ mdb3 ] Resource Group: mysql ip1 (ocf::heartbeat:IPaddr2): Started mdb4 ip1arp (ocf::heartbeat:SendArp): Started mdb4 ip2 (ocf::heartbeat:IPaddr2): Started mdb4 ip2arp (ocf::heartbeat:SendArp): Started mdb4 fs_mysql (ocf::heartbeat:Filesystem): Started mdb4 mysqld (ocf::heartbeat:mysql): Started mdb4 Crm resource migrate mysql ============ Last updated: Fri Jan 28 12:18:58 2011 Stack: Heartbeat Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 1 expected votes 2 Resources configured. ============ Online: [ mdb4 mdb3 ] Master/Slave Set: ms_drbd_mysql Masters: [ mdb3 ] Slaves: [ mdb4 ] Resource Group: mysql ip1 (ocf::heartbeat:IPaddr2): Started mdb3 ip1arp (ocf::heartbeat:SendArp): Started mdb3 ip2 (ocf::heartbeat:IPaddr2): Started mdb3 ip2arp (ocf::heartbeat:SendArp): Started mdb3 fs_mysql (ocf::heartbeat:Filesystem): Started mdb3 mysqld (ocf::heartbeat:mysql): Started mdb3 Crm resource unmove mysql Crm node standby mdb3 ============ Last updated: Fri Jan 28 12:20:40 2011 Stack: Heartbeat Current DC: mdb4 (050fc65c-29ad-4333-93c4-34d98405b952) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 1 expected votes 2 Resources configured. ============ Node mdb3 (5f4014cd-472e-4ab3-95e3-759152f16f52): standby Online: [ mdb4 ] Master/Slave Set: ms_drbd_mysql drbd_mysql:0 (ocf::linbit:drbd): Slave mdb4 (unmanaged) FAILED drbd_mysql:1 (ocf::linbit:drbd): Slave mdb3 (unmanaged) FAILED Failed actions: drbd_mysql:0_stop_0 (node=mdb4, call=67, rc=6, status=complete): not configured drbd_mysql:1_stop_0 (node=mdb3, call=65, rc=6, status=complete): not configured Error logs don't say much Tail n 30 /var/log/messages Jan 28 12:20:31 mdb3 IPaddr2[9506]: INFO: ip -f inet addr delete 192.168.162.12/17 dev eth0 Jan 28 12:20:31 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation ip1_stop_0 (call=61, rc=0, cib-update=69, confirmed=true) ok Jan 28 12:20:32 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing key=13:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_demote_0 ) Jan 28 12:20:32 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:62: demote Jan 28 12:20:32 mdb3 kernel: block drbd0: role( Primary -> Secondary ) Jan 28 12:20:32 mdb3 lrmd: [2778]: info: RA output: (drbd_mysql:1:demote:stdout) Jan 28 12:20:32 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation drbd_mysql:1_demote_0 (call=62, rc=0, cib-update=70, confirmed=true) ok Jan 28 12:20:34 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing key=69:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_notify_0 ) Jan 28 12:20:34 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:63: notify Jan 28 12:20:34 mdb3 lrmd: [2778]: info: RA output: (drbd_mysql:1:notify:stdout) Jan 28 12:20:34 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation drbd_mysql:1_notify_0 (call=63, rc=0, cib-update=71, confirmed=true) ok Jan 28 12:20:36 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing key=63:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_notify_0 ) Jan 28 12:20:36 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:64: notify Jan 28 12:20:36 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation drbd_mysql:1_notify_0 (call=64, rc=0, cib-update=72, confirmed=true) ok Jan 28 12:20:37 mdb3 crmd: [2781]: info: do_lrm_rsc_op: Performing key=14:8:0:dc2c6518-0d45-4ecc-ac70-c7044d59c1c8 op=drbd_mysql:1_stop_0 ) Jan 28 12:20:37 mdb3 lrmd: [2778]: info: rsc:drbd_mysql:1:65: stop Jan 28 12:20:37 mdb3 drbd[9631]: ERROR: you really should enable notify when using this RA Jan 28 12:20:37 mdb3 crmd: [2781]: info: process_lrm_event: LRM operation drbd_mysql:1_stop_0 (call=65, rc=6, cib-update=73, confirmed=true) not configured Jan 28 12:20:39 mdb3 attrd: [2780]: info: attrd_ha_callback: Update relayed from mdb4 Jan 28 12:20:39 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash entry for fail-count-drbd_mysql:1 Jan 28 12:20:39 mdb3 attrd: [2780]: info: attrd_trigger_update: Sending flush op to all hosts for: fail-count-drbd_mysql:1 (INFINITY) Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_perform_update: Sent update 21: fail-count-drbd_mysql:1=INFINITY Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: Update relayed from mdb4 Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash entry for last-failure-drbd_mysql:1 Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_trigger_update: Sending flush op to all hosts for: last-failure-drbd_mysql:1 (1296235239) Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_perform_update: Sent update 24: last-failure-drbd_mysql:1=1296235239 Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: flush message from mdb4 Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash entry for fail-count-drbd_mysql:0 Jan 28 12:20:40 mdb3 attrd: [2780]: info: attrd_ha_callback: flush message from mdb4 Jan 28 12:20:40 mdb3 attrd: [2780]: info: find_hash_entry: Creating hash entry for last-failure-drbd_mysql:0 /* configurations */ Crm configure node $id="050fc65c-29ad-4333-93c4-34d98405b952" mdb4 \ attributes standby="off" node $id="5f4014cd-472e-4ab3-95e3-759152f16f52" mdb3 \ attributes standby="on" primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" primitive fs_mysql ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql" fstype="ext3" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="120" primitive ip1 ocf:heartbeat:IPaddr2 \ params ip="192.168.162.12" nic="eth0:0" cidr_netmask="17" \ op monitor interval="5s" primitive ip1arp ocf:heartbeat:SendArp \ params ip="192.168.162.12" nic="eth0:0" primitive ip2 ocf:heartbeat:IPaddr2 \ params ip="97.107.136.62" nic="eth0:2" cidr_netmask="24" \ op monitor interval="5s" primitive ip2arp ocf:heartbeat:SendArp \ params ip="97.107.136.62" nic="eth0:2" primitive mysqld ocf:heartbeat:mysql \ params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" user="mysql" group="mysql" log="/var/log/mysql_safe.log" pid="/var/lib/mysql/mysqld.pid" datadir="/var/lib/mysql" \ op monitor interval="30s" timeout="30s" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" group mysql ip1 ip1arp ip2 ip2arp fs_mysql mysqld \ meta target-role="Started" ms ms_drbd_mysql drbd_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" location cli-standby-mysql mysql \ rule $id="cli-standby-rule-mysql" -inf: #uname eq mdb4 colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start property $id="cib-bootstrap-options" \ dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="Heartbeat" \ expected-quorum-votes="1" \ stonith-enabled="false" \ no-quorum-policy="ignore" rsc_defaults $id="rsc-options" \ resource-stickiness="100" /etc/drbd.conf global { usage-count yes; # minor-count dialog-refresh disable-ip-verification } common { protocol C; handlers { pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f"; local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; split-brain "/usr/lib/drbd/notify-split-brain.sh root"; out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root"; before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k"; after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; } startup { # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb } disk { # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes # no-disk-drain no-md-flushes max-bio-bvecs } net { # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork } syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg } } resource r0 { protocol C; syncer { rate 4M; } startup { wfc-timeout 15; degr-wfc-timeout 60; } net { cram-hmac-alg sha1; shared-secret "[snip]"; } on mdb3 { device /dev/drbd0; disk /dev/xvdc; address 192.168.156.171:7788; meta-disk internal; } on mdb4 { device /dev/drbd0; disk /dev/xvdc; address 192.168.140.133:7788; meta-disk internal; } } /etc/ha.d/ha.cf mdb3 logfile /var/log/heartbeat.log logfacility local0 keepalive 2 deadtime 15 warntime 5 initdead 120 udpport 694 ucast eth0 173.255.238.128 auto_failback on node mdb3 node mdb4 use_logd no crm respawn /etc/ha.d/ha.cf mdb4 logfile /var/log/heartbeat.log logfacility local0 keepalive 2 deadtime 15 warntime 5 initdead 120 udpport 694 ucast eth0 173.255.238.191 auto_failback on node mdb3 node mdb4 use_logd no crm respawn
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker