Hi Andrew and Emi, Please find the attached new pacemaker configuration and syslog, log attached is when I turn off working node(server) and the xen doesn't migrate. After sometime it does start but throws white blank screen in VNCViewer
Thanks in advance On Wed, Jun 4, 2014 at 2:52 PM, emmanuel segura <emi2f...@gmail.com> wrote: > Because you don't have configured the fencing > > > 2014-06-04 9:20 GMT+02:00 kamal kishi <kamal.ki...@gmail.com>: > > Hi emi, >> >> Cluster logs?? >> Rite now i'm getting all the logs in Syslog itself. >> >> Another thing i found out is that ocfs2 has some issue while a anyone >> server is offline or powered off, can you suggest if using ocfs2 in here is >> good option or not. >> >> Thank you >> >> >> On Tue, Jun 3, 2014 at 6:31 PM, emmanuel segura <emi2f...@gmail.com> >> wrote: >> >>> maybe i wrong, but i think you forgot the cluster logs >>> >>> >>> 2014-06-03 14:34 GMT+02:00 kamal kishi <kamal.ki...@gmail.com>: >>> >>>> Hi all, >>>> >>>> I'm sure many have come across same question and yes i've gone >>>> through most of the blogs and mailing list without much results. >>>> I'm trying to configure XEN HVM DOMU on DRBD replicated partition of >>>> filesystem type ocfs2 using Pacemaker. >>>> >>>> My question is what all changes to be done to below mentioned files of >>>> xen to work fine with pacemaker - >>>> /etc/xen/xend-config.sxp >>>> /etc/default/xendomains >>>> >>>> Let know if any other file to be edited . >>>> >>>> Find my configuration files attached. >>>> Many times the xen resource doesn't start. >>>> Even if the same starts, migration doesn't take place. >>>> Checked logs, some "Unknown error" is printed >>>> >>>> Would be helpful if someone could guide me through with configuration. >>>> >>>> Thanks in advance guys >>>> >>>> -- >>>> Regards, >>>> Kamal Kishore B V >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> >>> >>> >>> -- >>> esta es mi vida e me la vivo hasta que dios quiera >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> >> >> >> -- >> Regards, >> Kamal Kishore B V >> > > > > -- > esta es mi vida e me la vivo hasta que dios quiera > -- Regards, Kamal Kishore B V
node server1 node server2 primitive Clu-FS-DRBD ocf:linbit:drbd \ params drbd_resource="r0" \ operations $id="Clu-FS-DRBD-ops" \ op start interval="0" timeout="49s" \ op stop interval="0" timeout="50s" \ op monitor interval="40s" role="Master" timeout="50s" \ op monitor interval="41s" role="Slave" timeout="51s" \ meta target-role="started" primitive Clu-FS-Mount ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0" directory="/cluster" fstype="ocfs2" \ op monitor interval="120s" \ meta target-role="started" primitive xenwin7 ocf:heartbeat:Xen \ params xmfile="/home/cluster/xen/win7.cfg" \ op monitor interval="40s" \ meta target-role="started" is-managed="true" allow-migrate="true" ms Clu-FS-DRBD-Master Clu-FS-DRBD \ meta resource-stickines="100" master-max="2" notify="true" interleave="true" clone Clu-FS-Mount-Clone Clu-FS-Mount \ meta interleave="true" ordered="true" location drbd-fence-by-handler-Clu-FS-DRBD-Master Clu-FS-DRBD-Master \ rule $id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" $role="Master" -inf: #uname ne server1 colocation Clu-Clo-DRBD inf: Clu-FS-Mount-Clone Clu-FS-DRBD-Master:Master colocation win7-Xen-Clu-Clo inf: xenwin7 Clu-FS-Mount-Clone order Cluster-FS-After-DRBD inf: Clu-FS-DRBD-Master:promote Clu-FS-Mount-Clone:start property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ default-resource-stickiness="1000" \ last-lrm-refresh="1401960233"
Jun 5 15:11:39 server1 NetworkManager[887]: <info> (eth0): carrier now OFF (device state 10) Jun 5 15:11:39 server1 kernel: [ 2127.112852] bnx2 0000:01:00.0: eth0: NIC Copper Link is Down Jun 5 15:11:39 server1 kernel: [ 2127.113876] xenbr0: port 1(eth0) entering forwarding state Jun 5 15:11:41 server1 NetworkManager[887]: <info> (eth0): carrier now ON (device state 10) Jun 5 15:11:41 server1 kernel: [ 2129.231687] bnx2 0000:01:00.0: eth0: NIC Copper Link is Up, 100 Mbps full duplex, receive & transmit flow control ON Jun 5 15:11:41 server1 kernel: [ 2129.232672] xenbr0: port 1(eth0) entering forwarding state Jun 5 15:11:41 server1 kernel: [ 2129.232696] xenbr0: port 1(eth0) entering forwarding state Jun 5 15:11:42 server1 corosync[1556]: [TOTEM ] A processor failed, forming new configuration. Jun 5 15:11:43 server1 NetworkManager[887]: <info> (eth0): carrier now OFF (device state 10) Jun 5 15:11:43 server1 kernel: [ 2130.624346] bnx2 0000:01:00.0: eth0: NIC Copper Link is Down Jun 5 15:11:43 server1 kernel: [ 2130.625274] xenbr0: port 1(eth0) entering forwarding state Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 64: memb=1, new=0, lost=1 Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update: memb: server1 16777226 Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update: lost: server2 33554442 Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 64: memb=1, new=0, lost=0 Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: pcmk_peer_update: MEMB: server1 16777226 Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: ais_mark_unseen_peer_dead: Node server2 was not seen in the previous transition Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: update_member: Node 33554442/server2 is now: lost Jun 5 15:11:45 server1 corosync[1556]: [pcmk ] info: send_member_notification: Sending membership update 64 to 2 children Jun 5 15:11:45 server1 corosync[1556]: [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 5 15:11:45 server1 corosync[1556]: [CPG ] chosen downlist: sender r(0) ip(10.0.0.1) ; members(old:2 left:1) Jun 5 15:11:45 server1 corosync[1556]: [MAIN ] Completed service synchronization, ready to provide service. Jun 5 15:11:45 server1 cib: [1595]: notice: ais_dispatch_message: Membership 64: quorum lost Jun 5 15:11:45 server1 cib: [1595]: info: crm_update_peer: Node server2: id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2) votes=1 born=60 seen=60 proc=00000000000000000000000000111312 Jun 5 15:11:45 server1 crmd: [1600]: notice: ais_dispatch_message: Membership 64: quorum lost Jun 5 15:11:45 server1 crmd: [1600]: info: ais_status_callback: status: server2 is now lost (was member) Jun 5 15:11:45 server1 crmd: [1600]: info: crm_update_peer: Node server2: id=33554442 state=lost (new) addr=r(0) ip(10.0.0.2) votes=1 born=60 seen=60 proc=00000000000000000000000000111312 Jun 5 15:11:45 server1 crmd: [1600]: info: erase_node_from_join: Removed node server2 from join calculations: welcomed=0 itegrated=0 finalized=0 confirmed=1 Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crmd/146, version=0.52.3): ok (rc=0) Jun 5 15:11:45 server1 crmd: [1600]: info: crm_update_quorum: Updating quorum status to false (call=148) Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/148, version=0.52.5): ok (rc=0) Jun 5 15:11:45 server1 crmd: [1600]: info: crmd_ais_dispatch: Setting expected votes to 2 Jun 5 15:11:45 server1 crmd: [1600]: WARN: match_down_event: No match for shutdown action on server2 Jun 5 15:11:45 server1 crmd: [1600]: info: te_update_diff: Stonith/shutdown of server2 not matched Jun 5 15:11:45 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:215 - Triggered transition abort (complete=1, tag=node_state, id=server2, magic=NA, cib=0.52.4) : Node failure Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jun 5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke: Query 151: Requesting the current CIB: S_POLICY_ENGINE Jun 5 15:11:45 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/150, version=0.52.6): ok (rc=0) Jun 5 15:11:45 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=151, ref=pe_calc-dc-1401961305-161, seq=64, quorate=0 Jun 5 15:11:45 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 5 15:11:45 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown error (1) Jun 5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:11:45 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:11:45 server1 pengine: [1599]: notice: RecurringOp: Start recurring monitor (40s) for xenwin7 on server1 Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:0#011(Master server1) Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:1#011(Stopped) Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:0#011(Started server1) Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:1#011(Stopped) Jun 5 15:11:45 server1 pengine: [1599]: notice: LogActions: Start xenwin7#011(server1) Jun 5 15:11:45 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jun 5 15:11:45 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 38: 2 actions in 2 synapses Jun 5 15:11:45 server1 crmd: [1600]: info: do_te_invoke: Processing graph 38 (ref=pe_calc-dc-1401961305-161) derived from /var/lib/pengine/pe-input-93.bz2 Jun 5 15:11:45 server1 crmd: [1600]: info: te_rsc_command: Initiating action 40: start xenwin7_start_0 on server1 (local) Jun 5 15:11:45 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing key=40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_start_0 ) Jun 5 15:11:45 server1 lrmd: [1596]: info: rsc:xenwin7 start[41] (pid 9270) Jun 5 15:11:45 server1 pengine: [1599]: notice: process_pe_message: Transition 38: PEngine Input stored in: /var/lib/pengine/pe-input-93.bz2 Jun 5 15:11:58 server1 kernel: [ 2146.278476] block drbd0: PingAck did not arrive in time. Jun 5 15:11:58 server1 kernel: [ 2146.278488] block drbd0: peer( Primary -> Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) susp( 0 -> 1 ) Jun 5 15:11:58 server1 kernel: [ 2146.278686] block drbd0: asender terminated Jun 5 15:11:58 server1 kernel: [ 2146.278693] block drbd0: Terminating drbd0_asender Jun 5 15:11:58 server1 kernel: [ 2146.278771] block drbd0: Connection closed Jun 5 15:11:58 server1 kernel: [ 2146.278849] block drbd0: conn( NetworkFailure -> Unconnected ) Jun 5 15:11:58 server1 kernel: [ 2146.278860] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 Jun 5 15:11:58 server1 kernel: [ 2146.278864] block drbd0: receiver terminated Jun 5 15:11:58 server1 kernel: [ 2146.278868] block drbd0: Restarting drbd0_receiver Jun 5 15:11:58 server1 kernel: [ 2146.278872] block drbd0: receiver (re)started Jun 5 15:11:58 server1 kernel: [ 2146.278881] block drbd0: conn( Unconnected -> WFConnection ) Jun 5 15:11:58 server1 crm-fence-peer.sh[9353]: invoked for r0 Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: - <cib admin_epoch="0" epoch="52" num_updates="6" /> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <cib epoch="53" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.5" update-origin="server1" update-client="crm_resource" cib-last-written="Thu Jun 5 15:10:26 2014" have-quorum="0" dc-uuid="server1" > Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <configuration > Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <constraints > Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <rsc_location rsc="Clu-FS-DRBD-Master" id="drbd-fence-by-handler-Clu-FS-DRBD-Master" __crm_diff_marker__="added:top" > Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <rule role="Master" score="-INFINITY" id="drbd-fence-by-handler-rule-Clu-FS-DRBD-Master" > Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + <expression attribute="#uname" operation="ne" value="server1" id="drbd-fence-by-handler-expr-Clu-FS-DRBD-Master" /> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </rule> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </rsc_location> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </constraints> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </configuration> Jun 5 15:11:59 server1 cib: [1595]: info: cib:diff: + </cib> Jun 5 15:11:59 server1 cib: [1595]: info: cib_process_request: Operation complete: op cib_create for section constraints (origin=local/cibadmin/2, version=0.53.1): ok (rc=0) Jun 5 15:11:59 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:124 - Triggered transition abort (complete=0, tag=diff, id=(null), magic=NA, cib=0.53.1) : Non-status change Jun 5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Jun 5 15:11:59 server1 crmd: [1600]: info: update_abort_priority: Abort action done superceeded by restart Jun 5 15:11:59 server1 crm-fence-peer.sh[9353]: INFO peer is reachable, my disk is UpToDate: placed constraint 'drbd-fence-by-handler-Clu-FS-DRBD-Master' Jun 5 15:11:59 server1 kernel: [ 2147.428617] block drbd0: helper command: /sbin/drbdadm fence-peer minor-0 exit code 4 (0x400) Jun 5 15:11:59 server1 kernel: [ 2147.428623] block drbd0: fence-peer helper returned 4 (peer was fenced) Jun 5 15:11:59 server1 kernel: [ 2147.428632] block drbd0: pdsk( DUnknown -> Outdated ) Jun 5 15:11:59 server1 kernel: [ 2147.428680] block drbd0: new current UUID C7AE32BDEB8201AF:41DEB2849956CF9F:CE91A410F5C9F940:CE90A410F5C9F940 Jun 5 15:11:59 server1 kernel: [ 2147.428861] block drbd0: susp( 1 -> 0 ) Jun 5 15:12:05 server1 lrmd: [1596]: WARN: xenwin7:start process (PID 9270) timed out (try 1). Killing with signal SIGTERM (15). Jun 5 15:12:05 server1 lrmd: [1596]: WARN: operation start[41] on xenwin7 for client 1600: pid 9270 timed out Jun 5 15:12:05 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation xenwin7_start_0 (41) Timed Out (timeout=20000ms) Jun 5 15:12:05 server1 crmd: [1600]: WARN: status_from_rc: Action 40 (xenwin7_start_0) on server1 failed (target: 0 vs. rc: -2): Error Jun 5 15:12:05 server1 crmd: [1600]: WARN: update_failcount: Updating failcount for xenwin7 on server1 after failed start: rc=-2 (update=INFINITY, time=1401961325) Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=xenwin7_last_failure_0, magic=2:-2;40:38:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.2) : Event failed Jun 5 15:12:05 server1 crmd: [1600]: info: match_graph_event: Action xenwin7_start_0 (40) confirmed on server1 (rc=4) Jun 5 15:12:05 server1 crmd: [1600]: info: run_graph: ==================================================== Jun 5 15:12:05 server1 crmd: [1600]: notice: run_graph: Transition 38 (Complete=1, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pengine/pe-input-93.bz2): Stopped Jun 5 15:12:05 server1 crmd: [1600]: info: te_graph_trigger: Transition 38 is now complete Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jun 5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke: Query 153: Requesting the current CIB: S_POLICY_ENGINE Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-xenwin7 (INFINITY) Jun 5 15:12:05 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=153, ref=pe_calc-dc-1401961325-163, seq=64, quorate=0 Jun 5 15:12:05 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 5 15:12:05 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2) Jun 5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:05 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:05 server1 pengine: [1599]: notice: RecurringOp: Start recurring monitor (40s) for xenwin7 on server1 Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:0#011(Master server1) Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:1#011(Stopped) Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:0#011(Started server1) Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:1#011(Stopped) Jun 5 15:12:05 server1 pengine: [1599]: notice: LogActions: Recover xenwin7#011(Started server1) Jun 5 15:12:05 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jun 5 15:12:05 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 39: 4 actions in 4 synapses Jun 5 15:12:05 server1 crmd: [1600]: info: do_te_invoke: Processing graph 39 (ref=pe_calc-dc-1401961325-163) derived from /var/lib/pengine/pe-input-94.bz2 Jun 5 15:12:05 server1 crmd: [1600]: info: te_rsc_command: Initiating action 3: stop xenwin7_stop_0 on server1 (local) Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 124: fail-count-xenwin7=INFINITY Jun 5 15:12:05 server1 crmd: [1600]: info: do_lrm_rsc_op: Performing key=3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b op=xenwin7_stop_0 ) Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-xenwin7 (1401961325) Jun 5 15:12:05 server1 lrmd: [1596]: info: rsc:xenwin7 stop[42] (pid 9401) Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair, id=status-server1-fail-count-xenwin7, name=fail-count-xenwin7, value=INFINITY, magic=NA, cib=0.53.3) : Transient attribute: update Jun 5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort priority upgraded from 0 to 1000000 Jun 5 15:12:05 server1 crmd: [1600]: info: update_abort_priority: Abort action done superceeded by restart Jun 5 15:12:05 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 126: last-failure-xenwin7=1401961325 Jun 5 15:12:05 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=0, tag=nvpair, id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7, value=1401961325, magic=NA, cib=0.53.4) : Transient attribute: update Jun 5 15:12:05 server1 pengine: [1599]: notice: process_pe_message: Transition 39: PEngine Input stored in: /var/lib/pengine/pe-input-94.bz2 Jun 5 15:12:07 server1 kernel: [ 2155.458452] o2net: Connection to node server2 (num 1) at 10.0.0.2:7777 has been idle for 30.84 secs, shutting it down. Jun 5 15:12:07 server1 kernel: [ 2155.458486] o2net: No longer connected to node server2 (num 1) at 10.0.0.2:7777 Jun 5 15:12:07 server1 kernel: [ 2155.458531] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -112 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:07 server1 kernel: [ 2155.458538] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:13 server1 kernel: [ 2160.562477] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:13 server1 kernel: [ 2160.562484] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:18 server1 kernel: [ 2165.666468] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:18 server1 kernel: [ 2165.666475] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:23 server1 kernel: [ 2170.770473] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:23 server1 kernel: [ 2170.770481] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:25 server1 lrmd: [1596]: WARN: xenwin7:stop process (PID 9401) timed out (try 1). Killing with signal SIGTERM (15). Jun 5 15:12:25 server1 lrmd: [1596]: WARN: operation stop[42] on xenwin7 for client 1600: pid 9401 timed out Jun 5 15:12:25 server1 crmd: [1600]: ERROR: process_lrm_event: LRM operation xenwin7_stop_0 (42) Timed Out (timeout=20000ms) Jun 5 15:12:25 server1 crmd: [1600]: WARN: status_from_rc: Action 3 (xenwin7_stop_0) on server1 failed (target: 0 vs. rc: -2): Error Jun 5 15:12:25 server1 crmd: [1600]: WARN: update_failcount: Updating failcount for xenwin7 on server1 after failed stop: rc=-2 (update=INFINITY, time=1401961345) Jun 5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph: match_graph_event:277 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=xenwin7_last_failure_0, magic=2:-2;3:39:0:43add4e5-6270-43de-8ca9-8a4939271b5b, cib=0.53.5) : Event failed Jun 5 15:12:25 server1 crmd: [1600]: info: match_graph_event: Action xenwin7_stop_0 (3) confirmed on server1 (rc=4) Jun 5 15:12:25 server1 crmd: [1600]: info: run_graph: ==================================================== Jun 5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 39 (Complete=1, Pending=0, Fired=0, Skipped=3, Incomplete=0, Source=/var/lib/pengine/pe-input-94.bz2): Stopped Jun 5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 39 is now complete Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: All 1 cluster nodes are eligible to run resources. Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 155: Requesting the current CIB: S_POLICY_ENGINE Jun 5 15:12:25 server1 attrd: [1597]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-xenwin7 (1401961345) Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=155, ref=pe_calc-dc-1401961345-165, seq=64, quorate=0 Jun 5 15:12:25 server1 attrd: [1597]: notice: attrd_perform_update: Sent update 128: last-failure-xenwin7=1401961345 Jun 5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2) Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing xenwin7 away from server1 after 1000000 failures (max=1000000) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:0#011(Master server1) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:1#011(Stopped) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:0#011(Started server1) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:1#011(Stopped) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave xenwin7#011(Started unmanaged) Jun 5 15:12:25 server1 crmd: [1600]: info: abort_transition_graph: te_update_diff:164 - Triggered transition abort (complete=1, tag=nvpair, id=status-server1-last-failure-xenwin7, name=last-failure-xenwin7, value=1401961345, magic=NA, cib=0.53.6) : Transient attribute: update Jun 5 15:12:25 server1 crmd: [1600]: info: handle_response: pe_calc calculation pe_calc-dc-1401961345-165 is obsolete Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke: Query 156: Requesting the current CIB: S_POLICY_ENGINE Jun 5 15:12:25 server1 crmd: [1600]: info: do_pe_invoke_callback: Invoking the PE: query=156, ref=pe_calc-dc-1401961345-166, seq=64, quorate=0 Jun 5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition 40: PEngine Input stored in: /var/lib/pengine/pe-input-95.bz2 Jun 5 15:12:25 server1 pengine: [1599]: notice: unpack_config: On loss of CCM Quorum: Ignore Jun 5 15:12:25 server1 pengine: [1599]: WARN: unpack_rsc_op: Processing failed op xenwin7_last_failure_0 on server1: unknown exec error (-2) Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:25 server1 pengine: [1599]: notice: common_apply_stickiness: Clu-FS-DRBD-Master can fail 999999 more times on server2 before being forced off Jun 5 15:12:25 server1 pengine: [1599]: WARN: common_apply_stickiness: Forcing xenwin7 away from server1 after 1000000 failures (max=1000000) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:0#011(Master server1) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-DRBD:1#011(Stopped) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:0#011(Started server1) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave Clu-FS-Mount:1#011(Stopped) Jun 5 15:12:25 server1 pengine: [1599]: notice: LogActions: Leave xenwin7#011(Started unmanaged) Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jun 5 15:12:25 server1 crmd: [1600]: info: unpack_graph: Unpacked transition 41: 0 actions in 0 synapses Jun 5 15:12:25 server1 crmd: [1600]: info: do_te_invoke: Processing graph 41 (ref=pe_calc-dc-1401961345-166) derived from /var/lib/pengine/pe-input-96.bz2 Jun 5 15:12:25 server1 crmd: [1600]: info: run_graph: ==================================================== Jun 5 15:12:25 server1 crmd: [1600]: notice: run_graph: Transition 41 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-96.bz2): Complete Jun 5 15:12:25 server1 crmd: [1600]: info: te_graph_trigger: Transition 41 is now complete Jun 5 15:12:25 server1 crmd: [1600]: info: notify_crmd: Transition 41 status: done - <null> Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jun 5 15:12:25 server1 crmd: [1600]: info: do_state_transition: Starting PEngine Recheck Timer Jun 5 15:12:25 server1 pengine: [1599]: notice: process_pe_message: Transition 41: PEngine Input stored in: /var/lib/pengine/pe-input-96.bz2 Jun 5 15:12:28 server1 kernel: [ 2175.874477] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:28 server1 kernel: [ 2175.874485] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:33 server1 kernel: [ 2180.978498] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:33 server1 kernel: [ 2180.978506] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:38 server1 kernel: [ 2185.538465] o2net: No connection established with node 1 after 30.0 seconds, giving up. Jun 5 15:12:38 server1 kernel: [ 2186.082473] (xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:38 server1 kernel: [ 2186.082480] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:43 server1 kernel: [ 2191.186466] (xend,9339,0):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:43 server1 kernel: [ 2191.186474] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:44 server1 kernel: [ 2191.603442] (pool,9480,3):dlm_do_master_request:1332 ERROR: link to 1 went down! Jun 5 15:12:44 server1 kernel: [ 2191.603449] (pool,9480,3):dlm_get_lock_resource:917 ERROR: status = -107 Jun 5 15:12:48 server1 kernel: [ 2196.290472] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:48 server1 kernel: [ 2196.290480] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:53 server1 kernel: [ 2201.394470] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:53 server1 kernel: [ 2201.394477] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:12:58 server1 kernel: [ 2206.498469] (xend,9339,1):dlm_send_remote_convert_request:395 ERROR: Error -107 when sending message 504 (key 0x649b059e) to node 1 Jun 5 15:12:58 server1 kernel: [ 2206.498476] o2dlm: Waiting on the death of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:13:00 server1 kernel: [ 2207.550684] o2cb: o2dlm has evicted node 1 from domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:13:01 server1 kernel: [ 2208.562466] o2dlm: Waiting on the recovery of node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:13:03 server1 kernel: [ 2211.434473] o2dlm: Begin recovery on domain F18CB82626444DD0913312B7AE741C5B for node 1 Jun 5 15:13:03 server1 kernel: [ 2211.434501] o2dlm: Node 0 (me) is the Recovery Master for the dead node 1 in domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:13:03 server1 kernel: [ 2211.434597] o2dlm: End recovery on domain F18CB82626444DD0913312B7AE741C5B Jun 5 15:13:04 server1 kernel: [ 2211.602493] (pool,9480,3):dlm_restart_lock_mastery:1221 ERROR: node down! 1 Jun 5 15:13:04 server1 kernel: [ 2211.602502] (pool,9480,3):dlm_wait_for_lock_mastery:1038 ERROR: status = -11 Jun 5 15:13:05 server1 kernel: [ 2212.606674] ocfs2: Begin replay journal (node 1, slot 1) on device (147,0) Jun 5 15:13:06 server1 kernel: [ 2214.350572] ocfs2: End replay journal (node 1, slot 1) on device (147,0) Jun 5 15:13:06 server1 kernel: [ 2214.360790] ocfs2: Beginning quota recovery on device (147,0) for slot 1 Jun 5 15:13:06 server1 kernel: [ 2214.386783] ocfs2: Finishing quota recovery on device (147,0) for slot 1 Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/4/768 Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: add XENBUS_PATH=backend/vbd/4/5632 Jun 5 15:13:07 server1 kernel: [ 2214.638622] device tap4.0 entered promiscuous mode Jun 5 15:13:07 server1 kernel: [ 2214.638685] xenbr1: port 2(tap4.0) entering forwarding state Jun 5 15:13:07 server1 kernel: [ 2214.638699] xenbr1: port 2(tap4.0) entering forwarding state Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: devices added (path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0) Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: device added (path: /sys/devices/vif-4-0/net/vif4.0, iface: vif4.0): no ifupdown configuration found. Jun 5 15:13:07 server1 NetworkManager[887]: <warn> failed to allocate link cache: (-10) Operation not supported Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): carrier is OFF Jun 5 15:13:07 server1 NetworkManager[887]: <error> [1401961387.118193] [nm-device-ethernet.c:456] real_update_permanent_hw_address(): (vif4.0): unable to read permanent MAC address (error 0) Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): new Ethernet device (driver: 'vif' ifindex: 12) Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): exported as /org/freedesktop/NetworkManager/Devices/6 Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): now managed Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): bringing up device. Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): preparing device. Jun 5 15:13:07 server1 NetworkManager[887]: <info> (vif4.0): deactivating device (reason 'managed') [2] Jun 5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889) Jun 5 15:13:07 server1 NetworkManager[887]: <info> Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889) Jun 5 15:13:07 server1 NetworkManager[887]: <info> Added default wired connection 'Wired connection 5' for /sys/devices/vif-4-0/net/vif4.0 Jun 5 15:13:07 server1 kernel: [ 2214.659589] ADDRCONF(NETDEV_UP): vif4.0: link is not ready Jun 5 15:13:07 server1 kernel: [ 2214.660699] ADDRCONF(NETDEV_UP): vif4.0: link is not ready Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: online type_if=vif XENBUS_PATH=backend/vif/4/0 Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: add type_if=tap XENBUS_PATH= Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/node /dev/loop0 to xenstore. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/physical-device 7:0 to xenstore. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/768/hotplug-status connected to xenstore. Jun 5 15:13:07 server1 kernel: [ 2214.842610] xenbr1: port 2(tap4.0) entering forwarding state Jun 5 15:13:07 server1 kernel: [ 2214.852647] device vif4.0 entered promiscuous mode Jun 5 15:13:07 server1 kernel: [ 2214.858373] ADDRCONF(NETDEV_UP): vif4.0: link is not ready Jun 5 15:13:07 server1 kernel: [ 2214.861475] xenbr1: port 2(tap4.0) entering forwarding state Jun 5 15:13:07 server1 kernel: [ 2214.861487] xenbr1: port 2(tap4.0) entering forwarding state Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge add for tap4.0, bridge xenbr1. Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: devices added (path: /sys/devices/virtual/net/tap4.0, iface: tap4.0) Jun 5 15:13:07 server1 NetworkManager[887]: SCPlugin-Ifupdown: device added (path: /sys/devices/virtual/net/tap4.0, iface: tap4.0): no ifupdown configuration found. Jun 5 15:13:07 server1 NetworkManager[887]: <warn> /sys/devices/virtual/net/tap4.0: couldn't determine device driver; ignoring... Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Successful vif-bridge online for vif4.0, bridge xenbr1. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/vif-bridge: Writing backend/vif/4/0/hotplug-status connected to xenstore. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/node /dev/loop1 to xenstore. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/physical-device 7:1 to xenstore. Jun 5 15:13:07 server1 logger: /etc/xen/scripts/block: Writing backend/vbd/4/5632/hotplug-status connected to xenstore. Jun 5 15:13:08 server1 avahi-daemon[898]: Joining mDNS multicast group on interface tap4.0.IPv6 with address fe80::fcff:ffff:feff:ffff. Jun 5 15:13:08 server1 avahi-daemon[898]: New relevant interface tap4.0.IPv6 for mDNS. Jun 5 15:13:08 server1 avahi-daemon[898]: Registering new address record for fe80::fcff:ffff:feff:ffff on tap4.0.*. Jun 5 15:13:17 server1 kernel: [ 2225.202456] tap4.0: no IPv6 routers present
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org