On 12/07/2013, at 10:49 PM, "Howley, Tom" <tom.how...@hp.com> wrote:
> Hi, > > pacemaker:1.1.6-2ubuntu3, ouch > corosync:1.4.2-2, drbd8-utils 2:8.3.11-0ubuntu1 > > I have a three node setup, with two nodes running DRBD, resource-level > fencing enabled (‘resource-and-stonith’) and obviously stonith configured for > each node. In my current test case, I bring down network interface on the > DRBD primary/master node (using ifdown eth0, for example), which sometimes > leads to split-brain when the isolated node rejoins the cluster – the serious > problem is that upon rejoining, the isolated node is promoted to DRBD primary > (despite the original fencing constraint) , which opens us up to data-loss > for updates that occurred while that node was down. > > The exact problem scenario is as follows: > - Alice: DRBD Primary/Master, Bob: Secondary/Slave, Jim: Quorum > node, Epoch=100 > - ifdown eth0 on Alice > - Alice detects loss of network if, sets itself up as DC, carries > out some CIB updates (see log snippet below) that raises the epoch level, say > Epoch=102 epoch is bumped after an election and a configuration change but NOT a status change. so it shouldn't be making it to 102 > - Alice is shot via stonith. > - Bob adds fencing rule to CIB to prevent promotion of DRBD on any > other node, Epoch=101 > - When Alice comes back and rejoins the cluster, the DC decides to > sync to Alice CIB, thereby removing the fencing rule prematurely (i.e. before > the drbd devices have resynched). > - In some cases: Alice is promoted to Primary/Master and fences > resource to prevent promotion on any other node. > - We now have split-brain and potential loss of data. > > So some questions on the above: > 1. My initial feeling was that the isolated node, Alice, (which has no > quorum) should not be updating a CIB that could potentially override the sane > part of the cluster. Is that a fair comment? Not as currently designed. Although there may be some improvements we can make in that area. > 2. Is this issue just particular to my use of ‘ifdown ethX’ to disable > the network? This is hinted at here: > https://github.com/corosync/corosync/wiki/Corosync-and-ifdown-on-active-network-interface > Has this issue been addressed, or will it be in the future? > 3. If ‘ifdown ethX is not valid’, what is the best alternative that > mimics what might happen in real world? I have tried blocking connections > using iptables rules, dropping all incoming and outoing packets; initial > testing appears to show different corosync behaviour that would hopefully not > lead to my problem scenario, but I’m still in the process of confirming. I > have also carried out some cable pulls and not run into issues yet, but this > problem can be intermittent, so really needs an automated way to test many > times. > 4. The log snippet below from the isolated node shows that it updates > the CIB twice sometime after detecting loss of network interface. Why does > this happen? I believe that ultimately it is these CIB updates that increment > the epoch, which leads to this CIB overriding the cluster later. > > I have also tried a no-quorum-policy of ‘suicide’ in an attempt to prevent > CIB updates by the Alice, but it didn’t make a different. Why isn't your normal fencing device working? > Note that to facilitate log collection and analysis, I have added a delay to > the stonith reset operation, but I have also set the timeout on the > crm-fence-peer script to ensure that it is greater than this ‘deadtime’. > > Any advice on this would be greatly appreciated. > > Thanks, > > Tom > > Log snippet showing isolated node updating the CIB, which results in epoch > being incremented two times: > > Jul 10 13:42:54 stratus18 corosync[1268]: [TOTEM ] A processor failed, > forming new configuration. > Jul 10 13:42:54 stratus18 corosync[1268]: [TOTEM ] The network interface is > down. > Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: TOMTEST-DEBUG: modified > version > Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20758]: invoked for tomtest > Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: TOMTEST-DEBUG: modified > version > Jul 10 13:42:54 stratus18 crm-fence-peer.sh[20761]: invoked for tomtest > Jul 10 13:42:55 stratus18 stonith-ng: [1276]: info: stonith_command: > Processed st_execute from lrmd: rc=-1 > Jul 10 13:42:55 stratus18 external/ipmi[20806]: [20816]: ERROR: error > executing ipmitool: Connect failed: Network is unreachable#015 Unable to get > Chassis Power Status#015 > Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20758]: Call cib_query failed > (-41): Remote node did not respond > Jul 10 13:42:55 stratus18 crm-fence-peer.sh[20761]: Call cib_query failed > (-41): Remote node did not respond > Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #7 eth0, > 192.168.185.150#123, interface stats: received=0, sent=0, dropped=0, > active_time=912 secs > Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #4 eth0, > fe80::7ae7:d1ff:fe22:5270#123, interface stats: received=0, sent=0, > dropped=0, active_time=6080 secs > Jul 10 13:42:55 stratus18 ntpd[1062]: Deleting interface #3 eth0, > 192.168.185.118#123, interface stats: received=52, sent=53, dropped=0, > active_time=6080 secs > Jul 10 13:42:55 stratus18 ntpd[1062]: 192.168.8.97 interface 192.168.185.118 > -> (none) > Jul 10 13:42:55 stratus18 ntpd[1062]: peers refreshed > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: > pcmk_peer_update: Transitional membership event on ring 2728: memb=1, new=0, > lost=2 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: > memb: .unknown. 16777343 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: > lost: stratus18 1991878848 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: > lost: stratus20 2025433280 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: > pcmk_peer_update: Stable membership event on ring 2728: memb=1, new=0, lost=0 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Creating entry for node 16777343 born on 2728 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node 16777343/unknown is now: member > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: pcmk_peer_update: > MEMB: .pending. 16777343 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] ERROR: pcmk_peer_update: > Something strange happened: 1 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node stratus17 was not seen in the previous > transition > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node 1975101632/stratus17 is now: lost > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node stratus18 was not seen in the previous > transition > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node 1991878848/stratus18 is now: lost > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node stratus20 was not seen in the previous > transition > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node 2025433280/stratus20 is now: lost > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] WARN: > pcmk_update_nodeid: Detected local node id change: 1991878848 -> 16777343 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: destroy_ais_node: > Destroying entry for node 1991878848 > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] notice: ais_remove_peer: > Removed dead peer 1991878848 from the membership list > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: ais_remove_peer: > Sending removal of 1991878848 to 2 children > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > 0x13d9520 Node 16777343 now known as stratus18 (was: (null)) > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node stratus18 now has 1 quorum votes (was 0) > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > Node stratus18 now has process list: 00000000000000000000000000111312 > (1118994) > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: > send_member_notification: Sending membership update 2728 to 2 children > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: update_member: > 0x13d9520 Node 16777343 ((null)) born on: 2708 > Jul 10 13:42:55 stratus18 corosync[1268]: [TOTEM ] A processor joined or > left the membership and a new membership was formed. > Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now > has id: 16777343 > Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Membership > 2728: quorum retained > Jul 10 13:42:55 stratus18 cib: [1277]: info: ais_dispatch_message: Removing > peer 1991878848/1991878848 > Jul 10 13:42:55 stratus18 cib: [1277]: info: reap_crm_member: Peer 1991878848 > is unknown > Jul 10 13:42:55 stratus18 cib: [1277]: notice: ais_dispatch_message: > Membership 2728: quorum lost > Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus17: > id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117) votes=1 > born=2724 seen=2724 proc=00000000000000000000000000111312 > Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_update_peer: Node stratus20: > id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120) votes=1 born=4 > seen=2724 proc=00000000000000000000000000111312 > Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now > has id: 1991878848 > Jul 10 13:42:55 stratus18 corosync[1268]: [CPG ] chosen downlist: sender > r(0) ip(127.0.0.1) ; members(old:3 left:3) > Jul 10 13:42:55 stratus18 corosync[1268]: [MAIN ] Completed service > synchronization, ready to provide service. > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_get_peer: Node stratus18 > now has id: 16777343 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: > Membership 2728: quorum retained > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: Removing > peer 1991878848/1991878848 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: reap_crm_member: Peer > 1991878848 is unknown > Jul 10 13:42:55 stratus18 crmd: [1281]: notice: ais_dispatch_message: > Membership 2728: quorum lost > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: > stratus17 is now lost (was member) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node > stratus17: id=1975101632 state=lost (new) addr=r(0) ip(192.168.185.117) > votes=1 born=2724 seen=2724 proc=00000000000000000000000000111312 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_status_callback: status: > stratus20 is now lost (was member) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_peer: Node > stratus20: id=2025433280 state=lost (new) addr=r(0) ip(192.168.185.120) > votes=1 born=4 seen=2724 proc=00000000000000000000000000111312 > Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: check_dead_member: Our DC node > (stratus20) left the cluster > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State > transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL > origin=check_dead_member ] > Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Unset DC stratus20 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > cause=C_FSA_INTERNAL origin=do_election_check ] > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_te_control: Registering TE > UUID: 6e335eff-5e48-4fc1-9003-0537ae948dfd > Jul 10 13:42:55 stratus18 crmd: [1281]: info: set_graph_functions: Setting > custom graph functions > Jul 10 13:42:55 stratus18 crmd: [1281]: info: unpack_graph: Unpacked > transition -1: 0 actions in 0 synapses > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_takeover: Taking over DC > status for this partition > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_readwrite: We are > now in R/W mode > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_master for section 'all' (origin=local/crmd/57, > version=0.76.46): ok (rc=0) > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section cib (origin=local/crmd/58, > version=0.76.47): ok (rc=0) > Jul 10 13:42:55 stratus18 cib: [1277]: info: crm_get_peer: Node stratus18 now > has id: 16777343 > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section crm_config (origin=local/crmd/60, > version=0.76.48): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: join_make_offer: Making join > offers based on membership 2728 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_offer_all: join-1: > Waiting on 1 outstanding join acks > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: > Membership 2728: quorum still lost > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section crm_config (origin=local/crmd/62, > version=0.76.49): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting > expected votes to 2 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: update_dc: Set DC to stratus18 > (3.0.5) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Shutdown > escalation occurs after: 1200000ms > Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Checking > for expired actions every 900000ms > Jul 10 13:42:55 stratus18 crmd: [1281]: info: config_query_callback: Sending > expected-votes=3 to corosync > Jul 10 13:42:55 stratus18 crmd: [1281]: info: ais_dispatch_message: > Membership 2728: quorum still lost > Jul 10 13:42:55 stratus18 corosync[1268]: [pcmk ] info: > update_expected_votes: Expected quorum votes 2 -> 3 > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" > epoch="76" num_updates="49" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <configuration > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <crm_config > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - > <cluster_property_set id="cib-bootstrap-options" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <nvpair > value="3" id="cib-bootstrap-options-expected-quorum-votes" /> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - > </cluster_property_set> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </crm_config> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </configuration> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" > cib-last-written="Wed Jul 10 13:25:58 2013" crm_feature_set="3.0.5" > epoch="77" have-quorum="1" num_updates="1" update-client="crmd" > update-origin="stratus17" validate-with="pacemaker-1.2" dc-uuid="stratus20" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <configuration > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <crm_config > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + > <cluster_property_set id="cib-bootstrap-options" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <nvpair > id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" > value="2" /> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + > </cluster_property_set> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </crm_config> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </configuration> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section crm_config (origin=local/crmd/65, > version=0.77.1): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crmd_ais_dispatch: Setting > expected votes to 3 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State > transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 > cluster nodes responded to the join offer. > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_finalize: join-1: > Syncing the CIB from stratus18 to the rest of the cluster > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <cib admin_epoch="0" > epoch="77" num_updates="1" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <configuration > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <crm_config > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - > <cluster_property_set id="cib-bootstrap-options" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - <nvpair > value="2" id="cib-bootstrap-options-expected-quorum-votes" /> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - > </cluster_property_set> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </crm_config> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </configuration> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: - </cib> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <cib admin_epoch="0" > cib-last-written="Wed Jul 10 13:42:55 2013" crm_feature_set="3.0.5" > epoch="78" have-quorum="1" num_updates="1" update-client="crmd" > update-origin="stratus18" validate-with="pacemaker-1.2" dc-uuid="stratus20" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <configuration > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <crm_config > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + > <cluster_property_set id="cib-bootstrap-options" > > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + <nvpair > id="cib-bootstrap-options-expected-quorum-votes" name="expected-quorum-votes" > value="3" /> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + > </cluster_property_set> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </crm_config> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </configuration> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib:diff: + </cib> > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section crm_config (origin=local/crmd/68, > version=0.78.1): ok (rc=0) > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_sync for section 'all' (origin=local/crmd/69, > version=0.78.1): ok (rc=0) > Jul 10 13:42:55 stratus18 lrmd: [1278]: info: stonith_api_device_metadata: > looking up external/ipmi/heartbeat metadata > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section nodes (origin=local/crmd/70, > version=0.78.2): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_ack: join-1: > Updating node state to member for stratus18 > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_delete for section //node_state[@uname='stratus18']/lrm > (origin=local/crmd/71, version=0.78.3): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: erase_xpath_callback: Deletion > of "//node_state[@uname='stratus18']/lrm": ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: State > transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED > cause=C_FSA_INTERNAL origin=check_join_state ] > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_state_transition: All 1 > cluster nodes are eligible to run resources. > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_dc_join_final: Ensuring DC, > quorum and node attributes are up-to-date > Jul 10 13:42:55 stratus18 crmd: [1281]: info: crm_update_quorum: Updating > quorum status to false (call=75) > Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: > do_te_invoke:167 - Triggered transition abort (complete=1) : Peer Cancelled > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 76: > Requesting the current CIB: S_POLICY_ENGINE > Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_local_callback: > Sending full refresh (origin=crmd) > Jul 10 13:42:55 stratus18 attrd: [1279]: notice: attrd_trigger_update: > Sending flush op to all hosts for: probe_complete (true) > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section nodes (origin=local/crmd/73, > version=0.78.5): ok (rc=0) > Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for > shutdown action on stratus17 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: > Stonith/shutdown of stratus17 not matched > Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: > te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, > id=stratus17, magic=NA, cib=0.78.6) : Node failure > Jul 10 13:42:55 stratus18 crmd: [1281]: WARN: match_down_event: No match for > shutdown action on stratus20 > Jul 10 13:42:55 stratus18 crmd: [1281]: info: te_update_diff: > Stonith/shutdown of stratus20 not matched > Jul 10 13:42:55 stratus18 crmd: [1281]: info: abort_transition_graph: > te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, > id=stratus20, magic=NA, cib=0.78.6) : Node failure > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 77: > Requesting the current CIB: S_POLICY_ENGINE > Jul 10 13:42:55 stratus18 crmd: [1281]: info: do_pe_invoke: Query 78: > Requesting the current CIB: S_POLICY_ENGINE > Jul 10 13:42:55 stratus18 cib: [1277]: info: cib_process_request: Operation > complete: op cib_modify for section cib (origin=local/crmd/75, > version=0.78.7): ok (rc=0) > Jul 10 13:42:56 stratus18 crmd: [1281]: info: do_pe_invoke_callback: Invoking > the PE: query=78, ref=pe_calc-dc-1373460176-49, seq=2728, quorate=0 > Jul 10 13:42:56 stratus18 attrd: [1279]: notice: attrd_trigger_update: > Sending flush op to all hosts for: master-drbd_tomtest:0 (10000) > Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: cluster_status: We do not > have quorum - fencing and resource management disabled > Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node > stratus17 will be fenced because it is un-expectedly down > Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: > Node stratus17 is unclean > Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: pe_fence_node: Node > stratus20 will be fenced because it is un-expectedly down > Jul 10 13:42:56 stratus18 pengine: [1280]: WARN: determine_online_status: > Node stratus20 is unclean > Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error > - drbd_tomtest:0_last_failure_0 failed with rc=5: Preventing ms_drbd_tomtest > from re-starting on stratus20 > Jul 10 13:42:56 stratus18 pengine: [1280]: notice: unpack_rsc_op: Hard error > - tomtest_mysql_SERVICE_last_failure_0 failed with rc=5: Preventing > tomtest_mysql_SERVICE from re-starting on stratus20 > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org