Here it is attached. I also see the following 2 errors in the node 2 logs which I assume mean the problem is really that node1 is not getting demoted and I'm not sure why:
Error 1: Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Called drbdadm -c /etc/drbd.conf primary mysqld Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Exit code 11 Sep 28 19:53:20 staging2 drbd[8587]: ERROR: mysqld: Command output: Sep 28 19:53:20 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stdout) Sep 28 19:53:22 staging2 lrmd: [1442]: info: RA output: (drbd_mysql:1:promote:stderr) 0: State change failed: (-1) Multiple primaries not allowed by config Error 2: Sep 28 19:53:27 staging2 kernel: d-con mysqld: Requested state change failed by peer: Refusing to be Primary while peer is not outdated (-7) Sep 28 19:53:27 staging2 kernel: d-con mysqld: peer( Primary -> Unknown ) conn( Connected -> Disconnecting ) disk( UpToDate -> Outdated ) pdsk( UpToDate -> DUnknown ) Sep 28 19:53:27 staging2 kernel: d-con mysqld: meta connection shut down by peer. Also, failover works fine if i reboot either machine. The outdated machines comes back up as secondary. The scenario where i get the errors above is when i pull the network cable from the primary. Is that a stonith device that should be protecting from this scenario and potentially rebooting the primary? Feels like I'm getting so close to getting this working! Thanks! Charles On Thu, Sep 29, 2011 at 4:15 AM, Andrew Beekhof <and...@beekhof.net> wrote: > Could you attach /var/lib/pengine/pe-input-3802.bz2 from staging1? > That would tell us why. > > On Mon, Sep 26, 2011 at 10:28 PM, Charles Richard > <chachi.rich...@gmail.com> wrote: > > Hi, > > > > I'm making some headway finally with my pacemaker install but now that > > crm_mon doesn't return errors any more and crm_verify is clear, I'm > having a > > problem where my master won't get promoted. Not sure what to do with > this > > one, any suggestions? Here's the log snippet and config files: > > > > Sep 26 04:06:12 staging1 crmd: [1686]: info: crm_timer_popped: PEngine > > Recheck Timer (I_PE_CALC) just popped! > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_TIMER_POPPED > > origin=crm_timer_popped ] > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: > Progressed > > to state S_POLICY_ENGINE after C_TIMER_POPPED > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: All 2 > > cluster nodes are eligible to run resources. > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke: Query 106: > > Requesting the current CIB: S_POLICY_ENGINE > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_pe_invoke_callback: > Invoking > > the PE: query=106, ref=pe_calc-dc-1317020772-95, seq=2564, quorate=1 > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Startup > > probes: enabled > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: unpack_config: On loss > of > > CCM Quorum: Ignore > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_config: Node > scores: > > 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 > > Sep 26 04:06:12 staging1 pengine: [1685]: info: unpack_domains: Unpacking > > domains > > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status: > > Node staging1.dev.applepeak.com is online > > Sep 26 04:06:12 staging1 pengine: [1685]: info: determine_online_status: > > Node staging2.dev.applepeak.com is online > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: group_print: Resource > > Group: mysql > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > fs_mysql#011(ocf::heartbeat:Filesystem):#011Stopped > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > ip_mysql#011(ocf::heartbeat:IPaddr2):#011Stopped > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: native_print: > > mysqld#011(lsb:mysqld):#011Stopped > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: clone_print: > Master/Slave > > Set: ms_drbd_mysql > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: short_print: > Stopped: > > [ drbd_mysql:0 drbd_mysql:1 ] > > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: > ms_drbd_mysql: > > Promoted 0 instances of a possible 1 to master > > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights: > > fs_mysql: Rolling back scores from ip_mysql > > Sep 26 04:06:12 staging1 pengine: [1685]: info: native_merge_weights: > > ip_mysql: Rolling back scores from mysqld > > Sep 26 04:06:12 staging1 pengine: [1685]: info: master_color: > ms_drbd_mysql: > > Promoted 0 instances of a possible 1 to master > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > resource > > fs_mysql#011(Stopped) > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > resource > > ip_mysql#011(Stopped) > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > resource > > mysqld#011(Stopped) > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > resource > > drbd_mysql:0#011(Stopped) > > Sep 26 04:06:12 staging1 pengine: [1685]: notice: LogActions: Leave > resource > > drbd_mysql:1#011(Stopped) > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > > cause=C_IPC_MESSAGE origin=handle_response ] > > Sep 26 04:06:12 staging1 crmd: [1686]: info: unpack_graph: Unpacked > > transition 72: 0 actions in 0 synapses > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_te_invoke: Processing > graph > > 72 (ref=pe_calc-dc-1317020772-95) derived from > > /var/lib/pengine/pe-input-3802.bz2 > > Sep 26 04:06:12 staging1 crmd: [1686]: info: run_graph: > > ==================================================== > > Sep 26 04:06:12 staging1 crmd: [1686]: notice: run_graph: Transition 72 > > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > > Source=/var/lib/pengine/pe-input-3802.bz2): Complete > > Sep 26 04:06:12 staging1 crmd: [1686]: info: te_graph_trigger: Transition > 72 > > is now complete > > Sep 26 04:06:12 staging1 crmd: [1686]: info: notify_crmd: Transition 72 > > status: done - <null> > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: State > > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > > cause=C_FSA_INTERNAL origin=notify_crmd ] > > Sep 26 04:06:12 staging1 crmd: [1686]: info: do_state_transition: > Starting > > PEngine Recheck Timer > > Sep 26 04:06:12 staging1 pengine: [1685]: info: process_pe_message: > > Transition 72: PEngine Input stored in: > /var/lib/pengine/pe-input-3802.bz2 > > Sep 26 04:15:09 staging1 cib: [1682]: info: cib_stats: Processed 1 > > operations (0.00us average, 0% utilization) in the last 10min > > > > My drbd config file: > > > > resource mysqld { > > > > protocol C; > > > > startup { wfc-timeout 0; degr-wfc-timeout 120; } > > > > disk { on-io-error detach; } > > > > > > on staging1 { > > > > device /dev/drbd0; > > > > disk /dev/vg_staging1/lv_data; > > > > meta-disk internal; > > > > address 10.10.20.1:7788; > > > > } > > > > on staging2 { > > > > device /dev/drbd0; > > > > disk /dev/vg_staging2/lv_data; > > > > meta-disk internal; > > > > address 10.10.20.2:7788; > > > > } > > > > } > > > > corosync.conf: > > > > compatibility: whitetank > > > > aisexec { > > user: root > > group: root > > } > > > > totem { > > version: 2 > > secauth: off > > threads: 0 > > interface { > > ringnumber: 0 > > bindnetaddr: 10.10.10.0 > > mcastaddr: 226.94.1.1 > > mcastport: 5405 > > } > > } > > > > logging { > > fileline: off > > to_stderr: no > > to_logfile: no > > to_syslog: yes > > logfile: /var/log/cluster/corosync.log > > debug: off > > timestamp: on > > logger_subsys { > > subsys: AMF > > debug: off > > } > > } > > > > amf { > > mode: disabled > > } > > > > service { > > #Load Pacemaker > > name: pacemaker > > ver: 0 > > use_mgmtd: yes > > } > > > > And my crm config: > > > > node staging1.dev.applepeak.com > > node staging2.dev.applepeak.com > > primitive drbd_mysql ocf:linbit:drbd \ > > params drbd_resource="mysqld" \ > > op monitor interval="15s" \ > > op start interval="0" timeout="240s" \ > > op stop interval="0" timeout="100s" > > primitive fs_mysql ocf:heartbeat:Filesystem \ > > params device="/dev/drbd0" directory="/opt/data/mysql/data/mysql" > > fstype="ext4" \ > > op start interval="0" timeout="60s" \ > > op stop interval="0" timeout="60s" > > primitive ip_mysql ocf:heartbeat:IPaddr2 \ > > params ip="10.10.10.31" nic="eth0" > > primitive mysqld lsb:mysqld > > group mysql fs_mysql ip_mysql mysqld > > ms ms_drbd_mysql drbd_mysql \ > > meta master-max="1" master-node-max="1" clone-max="2" > > clone-node-max="1" notify="true" > > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master > > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start > > property $id="cib-bootstrap-options" \ > > dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \ > > cluster-infrastructure="openais" \ > > expected-quorum-votes="2" \ > > stonith-enabled="false" \ > > last-lrm-refresh="1316961847" \ > > stop-all-resources="true" \ > > no-quorum-policy="ignore" > > rsc_defaults $id="rsc-options" \ > > resource-stickiness="100" > > > > Thanks, > > Charles > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >
pe-warn-3802.bz2
Description: BZip2 compressed data
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker