>>> Muhammad Sharfuddin <m.sharfud...@nds.com.pk> schrieb am 05.07.2015 um >>> 18:13 in Nachricht <559957c4.8090...@nds.com.pk>: > SLES 11 SP 3 + online updates(pacemaker-1.1.11-0.8.11.70 > openais-1.1.4-5.22.1.7) > > Its a dual primary drbd cluster, which mounts a file system resource on > both the cluster nodes simultaneously(file system type is ocfs2). > > Whenever any of the nodes goes down, the file system(/sharedata) become > inaccessible for exact 35 seconds on the other (surviving/online) node, > and then become available again on the online node.
I'm no expert on DRBD, but I guess DRBD has to resolve the split brain situation: Obviously both nodes cannot continue to be primary if the connection between the nodes is down. When one node is down, DRDB first sees a connection that is down. If the timeout were tiny, you might have a lot of false "connection interrupted" alerts. > > Please help me understand why the node which survives or remains online > unable to access the file system resource(/sharedata) for 35 seconds ? > and how can I fix the cluster so that file system remains accessible on > the surviving node without any interruption/delay(as in my case of about > 35 seconds) For true continuous access I guess you need true shared disks, or a clean DRBD shutdown on the other node. Regards, Ulrich > > By inaccessible, I meant to say that running "ls -l /sharedata" and "df > /sharedata" does not return any output and does not return the prompt > back on the online node for exact 35 seconds once the other node becomes > offline. > > e.g "node1" got offline somewhere around 01:37:15, and then /sharedata > file system was inaccessible during 01:37:35 and 01:38:18 on the online > node i.e "node2". > > > /var/log/messages on node2, when node1 went offline: > Jul 5 01:37:26 node2 kernel: [ 675.255865] drbd r0: PingAck did not > arrive in time. > Jul 5 01:37:26 node2 kernel: [ 675.255886] drbd r0: peer( Primary -> > Unknown ) conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown ) > Jul 5 01:37:26 node2 kernel: [ 675.256030] block drbd0: new current > UUID C23D1458962AD18D:A8DD404C9F563391:6A5F4A26F64BAF0B:6A5E4A26F64BAF0B > Jul 5 01:37:26 node2 kernel: [ 675.256079] drbd r0: asender terminated > Jul 5 01:37:26 node2 kernel: [ 675.256081] drbd r0: Terminating drbd_a_r0 > Jul 5 01:37:26 node2 kernel: [ 675.256306] drbd r0: Connection closed > Jul 5 01:37:26 node2 kernel: [ 675.256338] drbd r0: conn( > NetworkFailure -> Unconnected ) > Jul 5 01:37:26 node2 kernel: [ 675.256339] drbd r0: receiver terminated > Jul 5 01:37:26 node2 kernel: [ 675.256340] drbd r0: Restarting > receiver thread > Jul 5 01:37:26 node2 kernel: [ 675.256341] drbd r0: receiver (re)started > Jul 5 01:37:26 node2 kernel: [ 675.256344] drbd r0: conn( Unconnected > -> WFConnection ) > Jul 5 01:37:29 node2 corosync[4040]: [TOTEM ] A processor failed, > forming new configuration. > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] CLM CONFIGURATION CHANGE > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] New Configuration: > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] r(0) ip(172.16.241.132) > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] Members Left: > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] r(0) ip(172.16.241.131) > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] Members Joined: > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] notice: > pcmk_peer_update: Transitional membership event on ring 216: memb=1, > new=0, lost=1 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update: > memb: node2 739307908 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update: > lost: node1 739307907 > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] CLM CONFIGURATION CHANGE > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] New Configuration: > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] r(0) ip(172.16.241.132) > Jul 5 01:37:35 node2 cluster-dlm[4344]: notice: > plugin_handle_membership: Membership 216: quorum lost > Jul 5 01:37:35 node2 ocfs2_controld[4473]: notice: > plugin_handle_membership: Membership 216: quorum lost > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] Members Left: > Jul 5 01:37:35 node2 crmd[4050]: notice: plugin_handle_membership: > Membership 216: quorum lost > Jul 5 01:37:35 node2 stonith-ng[4046]: notice: > plugin_handle_membership: Membership 216: quorum lost > Jul 5 01:37:35 node2 cib[4045]: notice: plugin_handle_membership: > Membership 216: quorum lost > Jul 5 01:37:35 node2 cluster-dlm[4344]: notice: > crm_update_peer_state: plugin_handle_membership: Node node1[739307907] - > state is now lost (was member) > Jul 5 01:37:35 node2 ocfs2_controld[4473]: notice: > crm_update_peer_state: plugin_handle_membership: Node node1[739307907] - > state is now lost (was member) > Jul 5 01:37:35 node2 corosync[4040]: [CLM ] Members Joined: > Jul 5 01:37:35 node2 crmd[4050]: warning: match_down_event: No match > for shutdown action on node1 > Jul 5 01:37:35 node2 stonith-ng[4046]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[739307907] - state is now lost (was > member) > Jul 5 01:37:35 node2 cib[4045]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[739307907] - state is now lost (was > member) > Jul 5 01:37:35 node2 cluster-dlm[4344]: update_cluster: Processing > membership 216 > Jul 5 01:37:35 node2 ocfs2_controld[4473]: confchg called > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] notice: > pcmk_peer_update: Stable membership event on ring 216: memb=1, new=0, lost=0 > Jul 5 01:37:35 node2 crmd[4050]: notice: peer_update_callback: > Stonith/shutdown of node1 not matched > Jul 5 01:37:35 node2 cluster-dlm[4344]: dlm_process_node: Skipped > active node 739307908: born-on=204, last-seen=216, this-event=216, > last-event=212 > Jul 5 01:37:35 node2 ocfs2_controld[4473]: ocfs2_controld (group > "ocfs2:controld") confchg: members 1, left 1, joined 0 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: pcmk_peer_update: > MEMB: node2 739307908 > Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_peer_state: > plugin_handle_membership: Node node1[739307907] - state is now lost (was > member) > Jul 5 01:37:35 node2 cluster-dlm[4344]: del_configfs_node: > del_configfs_node rmdir "/sys/kernel/config/dlm/cluster/comms/739307907" > Jul 5 01:37:35 node2 ocfs2_controld[4473]: node daemon left 739307907 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: > ais_mark_unseen_peer_dead: Node node1 was not seen in the previous > transition > Jul 5 01:37:35 node2 crmd[4050]: warning: match_down_event: No match > for shutdown action on node1 > Jul 5 01:37:35 node2 cluster-dlm[4344]: dlm_process_node: Removed > inactive node 739307907: born-on=212, last-seen=212, this-event=216, > last-event=212 > Jul 5 01:37:35 node2 ocfs2_controld[4473]: node down 739307907 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: update_member: > Node 739307907/node1 is now: lost > Jul 5 01:37:35 node2 crmd[4050]: notice: peer_update_callback: > Stonith/shutdown of node1 not matched > Jul 5 01:37:35 node2 cluster-dlm[4344]: log_config: dlm:controld conf 1 > 0 1 memb 739307908 join left 739307907 > Jul 5 01:37:35 node2 ocfs2_controld[4473]: Node 739307907 has left > mountgroup 12925001B70F4A529491DFF16F8B8352 > Jul 5 01:37:35 node2 corosync[4040]: [pcmk ] info: > send_member_notification: Sending membership update 216 to 5 children > Jul 5 01:37:35 node2 sbd: [4028]: WARN: CIB: We do NOT have quorum! > Jul 5 01:37:35 node2 sbd: [4026]: WARN: Pacemaker health check: UNHEALTHY > Jul 5 01:37:35 node2 cluster-dlm[4344]: log_config: > dlm:ls:12925001B70F4A529491DFF16F8B8352 conf 1 0 1 memb 739307908 join > left 739307907 > Jul 5 01:37:35 node2 corosync[4040]: [TOTEM ] A processor joined or > left the membership and a new membership was formed. > Jul 5 01:37:35 node2 corosync[4040]: [CPG ] chosen downlist: sender > r(0) ip(172.16.241.132) ; members(old:2 left:1) > Jul 5 01:37:35 node2 cluster-dlm[4344]: node_history_fail: > 12925001B70F4A529491DFF16F8B8352 check_fs nodeid 739307907 set > Jul 5 01:37:35 node2 cluster-dlm[4344]: add_change: > 12925001B70F4A529491DFF16F8B8352 add_change cg 4 remove nodeid 739307907 > reason 3 > Jul 5 01:37:35 node2 corosync[4040]: [MAIN ] Completed service > synchronization, ready to provide service. > Jul 5 01:37:35 node2 cluster-dlm[4344]: add_change: > 12925001B70F4A529491DFF16F8B8352 add_change cg 4 counts member 1 joined > 0 remove 1 failed 1 > Jul 5 01:37:35 node2 cluster-dlm[4344]: stop_kernel: > 12925001B70F4A529491DFF16F8B8352 stop_kernel cg 4 > Jul 5 01:37:35 node2 ocfs2_controld[4473]: Sending notification of node > 739307907 for "12925001B70F4A529491DFF16F8B8352" > Jul 5 01:37:35 node2 ocfs2_controld[4473]: confchg called > Jul 5 01:37:35 node2 cluster-dlm[4344]: do_sysfs: write "0" to > "/sys/kernel/dlm/12925001B70F4A529491DFF16F8B8352/control" > Jul 5 01:37:35 node2 ocfs2_controld[4473]: group > "ocfs2:12925001B70F4A529491DFF16F8B8352" confchg: members 1, left 1, > joined 0 > Jul 5 01:37:35 node2 kernel: [ 684.452042] dlm: closing connection to > node 739307907 > Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_quorum: Updating > quorum status to false (call=114) > Jul 5 01:37:35 node2 cluster-dlm[4344]: fence_node_time: Node > 739307907/node1 has not been shot yet > Jul 5 01:37:35 node2 crmd[4050]: notice: do_state_transition: State > transition S_IDLE -> S_INTEGRATION [ input=I_NODE_JOIN > cause=C_FSA_INTERNAL origin=check_join_state ] > Jul 5 01:37:35 node2 cluster-dlm[4344]: check_fencing_done: > 12925001B70F4A529491DFF16F8B8352 check_fencing 739307907 wait add > 1436042043 fail 1436042255 last 0 > Jul 5 01:37:35 node2 crmd[4050]: notice: crm_update_quorum: Updating > quorum status to false (call=119) > Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_local_callback: > Sending full refresh (origin=crmd) > Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update: > Sending flush op to all hosts for: shutdown (0) > Jul 5 01:37:35 node2 pengine[4049]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Jul 5 01:37:35 node2 pengine[4049]: warning: pe_fence_node: Node node1 > will be fenced because the node is no longer part of the cluster > Jul 5 01:37:35 node2 pengine[4049]: warning: determine_online_status: > Node node1 is unclean > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_demote_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_demote_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update: > Sending flush op to all hosts for: master-p-drbd (10000) > Jul 5 01:37:35 node2 cluster-dlm[4344]: set_fs_notified: > 12925001B70F4A529491DFF16F8B8352 set_fs_notified nodeid 739307907 > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_demote_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 attrd[4048]: notice: attrd_trigger_update: > Sending flush op to all hosts for: probe_complete (true) > Jul 5 01:37:35 node2 ocfs2_controld[4473]: message from dlmcontrol > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 ocfs2_controld[4473]: Notified for > "12925001B70F4A529491DFF16F8B8352", node 739307907, status 0 > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_demote_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 ocfs2_controld[4473]: Completing notification on > "12925001B70F4A529491DFF16F8B8352" for node 739307907 > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-drbd:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-dlm:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-dlm:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-o2cb:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-o2cb:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-vg1:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-vg1:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-sharefs:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: custom_action: Action > p-sharefs:1_stop_0 on node1 is unrunnable (offline) > Jul 5 01:37:35 node2 pengine[4049]: warning: stage6: Scheduling Node > node1 for STONITH > Jul 5 01:37:35 node2 pengine[4049]: error: rsc_expand_action: > Couldn't expand cl-lock_demote_0 > Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:37:35 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:37:35 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:37:35 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:37:35 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:37:35 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Demote > p-drbd:1 (Master -> Stopped node1) > Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stop > p-dlm:1 (node1) > Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stop > p-o2cb:1 (node1) > Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stop > p-vg1:1 (node1) > Jul 5 01:37:35 node2 pengine[4049]: notice: LogActions: Stop > p-sharefs:1 (node1) > Jul 5 01:37:35 node2 pengine[4049]: warning: process_pe_message: > Calculated Transition 6: /var/lib/pacemaker/pengine/pe-warn-5.bz2 > Jul 5 01:37:35 node2 crmd[4050]: notice: do_te_invoke: Processing > graph 6 (ref=pe_calc-dc-1436042255-74) derived from > /var/lib/pacemaker/pengine/pe-warn-5.bz2 > Jul 5 01:37:35 node2 crmd[4050]: notice: te_fence_node: Executing > reboot fencing operation (79) on node1 (timeout=90000) > Jul 5 01:37:35 node2 crmd[4050]: notice: te_rsc_command: Initiating > action 95: notify p-drbd_pre_notify_demote_0 on node2 (local) > Jul 5 01:37:35 node2 stonith-ng[4046]: notice: handle_request: Client > crmd.4050.3c8ccdb0 wants to fence (reboot) 'node1' with device '(any)' > Jul 5 01:37:35 node2 stonith-ng[4046]: notice: > initiate_remote_stonith_op: Initiating remote operation reboot for > node1: 2fce4697-65d5-40d2-b5b4-5c10be4d3d5a (0) > Jul 5 01:37:35 node2 crmd[4050]: notice: process_lrm_event: LRM > operation p-drbd_notify_0 (call=54, rc=0, cib-update=0, confirmed=true) ok > Jul 5 01:37:35 node2 stonith-ng[4046]: notice: > can_fence_host_with_device: sbd_stonith can fence node1: dynamic-list > Jul 5 01:37:35 node2 sbd: [6197]: info: Delivery process handling > /dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382dfbd > Jul 5 01:37:35 node2 sbd: [6197]: info: Device UUID: > 4e398182-4894-4f11-b84e-6c6ad4c02614 > Jul 5 01:37:35 node2 sbd: [6197]: info: Writing reset to node slot node1 > Jul 5 01:37:35 node2 sbd: [6197]: info: Messaging delay: 40 > Jul 5 01:38:15 node2 sbd: [6197]: info: reset successfully delivered to > node1 > Jul 5 01:38:15 node2 sbd: [6196]: info: Message successfully delivered. > Jul 5 01:38:16 node2 stonith-ng[4046]: notice: log_operation: > Operation 'reboot' [6185] (call 3 from crmd.4050) for host 'node1' with > device 'sbd_stonith' returned: 0 (OK) > Jul 5 01:38:16 node2 stonith-ng[4046]: notice: remote_op_done: > Operation reboot of node1 by node2 for crmd.4050@node2.2fce4697: OK > Jul 5 01:38:16 node2 crmd[4050]: notice: tengine_stonith_callback: > Stonith operation 3/79:6:0:34e9a1ca-ebc9-4114-b5b7-26e670dc6aa0: OK (0) > Jul 5 01:38:16 node2 crmd[4050]: notice: tengine_stonith_notify: Peer > node1 was terminated (reboot) by node2 for node2: OK > (ref=2fce4697-65d5-40d2-b5b4-5c10be4d3d5a) by client crmd.4050 > Jul 5 01:38:16 node2 crmd[4050]: notice: te_rsc_command: Initiating > action 96: notify p-drbd_post_notify_demote_0 on node2 (local) > Jul 5 01:38:16 node2 crmd[4050]: notice: process_lrm_event: LRM > operation p-drbd_notify_0 (call=55, rc=0, cib-update=0, confirmed=true) ok > Jul 5 01:38:16 node2 crmd[4050]: notice: run_graph: Transition 6 > (Complete=24, Pending=0, Fired=0, Skipped=6, Incomplete=5, > Source=/var/lib/pacemaker/pengine/pe-warn-5.bz2): Stopped > Jul 5 01:38:16 node2 pengine[4049]: notice: unpack_config: On loss of > CCM Quorum: Ignore > Jul 5 01:38:16 node2 pengine[4049]: error: rsc_expand_action: > Couldn't expand cl-lock_demote_0 > Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:38:16 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:38:16 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:38:16 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:38:16 node2 pengine[4049]: error: crm_abort: > clone_update_actions_interleave: Triggered assert at clone.c:1260 : > first_action != NULL || is_set(first_child->flags, pe_rsc_orphan) > Jul 5 01:38:16 node2 pengine[4049]: error: > clone_update_actions_interleave: No action found for demote in g-lock:0 > (first) > Jul 5 01:38:16 node2 pengine[4049]: notice: process_pe_message: > Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-30.bz2 > Jul 5 01:38:16 node2 crmd[4050]: notice: do_te_invoke: Processing > graph 7 (ref=pe_calc-dc-1436042296-79) derived from > /var/lib/pacemaker/pengine/pe-input-30.bz2 > Jul 5 01:38:16 node2 crmd[4050]: notice: run_graph: Transition 7 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pacemaker/pengine/pe-input-30.bz2): Complete > Jul 5 01:38:16 node2 crmd[4050]: notice: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Jul 5 01:38:17 node2 cluster-dlm[4344]: fence_node_time: Node > 739307907/node1 was last shot 'now' > Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fencing_done: > 12925001B70F4A529491DFF16F8B8352 check_fencing 739307907 done add > 1436042043 fail 1436042255 last 1436042297 > Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fencing_done: > 12925001B70F4A529491DFF16F8B8352 check_fencing done > Jul 5 01:38:17 node2 cluster-dlm[4344]: check_quorum_done: > 12925001B70F4A529491DFF16F8B8352 check_quorum disabled > Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fs_done: > 12925001B70F4A529491DFF16F8B8352 check_fs nodeid 739307907 clear > Jul 5 01:38:17 node2 cluster-dlm[4344]: check_fs_done: > 12925001B70F4A529491DFF16F8B8352 check_fs done > Jul 5 01:38:17 node2 cluster-dlm[4344]: send_info: > 12925001B70F4A529491DFF16F8B8352 send_start cg 4 flags 2 data2 0 counts > 3 1 0 1 1 > Jul 5 01:38:17 node2 cluster-dlm[4344]: receive_start: > 12925001B70F4A529491DFF16F8B8352 receive_start 739307908:4 len 76 > Jul 5 01:38:17 node2 cluster-dlm[4344]: match_change: > 12925001B70F4A529491DFF16F8B8352 match_change 739307908:4 matches cg 4 > Jul 5 01:38:17 node2 cluster-dlm[4344]: wait_messages_done: > 12925001B70F4A529491DFF16F8B8352 wait_messages cg 4 got all 1 > Jul 5 01:38:17 node2 cluster-dlm[4344]: start_kernel: > 12925001B70F4A529491DFF16F8B8352 start_kernel cg 4 member_count 1 > Jul 5 01:38:17 node2 cluster-dlm[4344]: update_dir_members: dir_member > 739307907 > Jul 5 01:38:17 node2 cluster-dlm[4344]: update_dir_members: dir_member > 739307908 > Jul 5 01:38:17 node2 cluster-dlm[4344]: set_configfs_members: > set_members rmdir > "/sys/kernel/config/dlm/cluster/spaces/12925001B70F4A529491DFF16F8B8352/node > s/739307907" > Jul 5 01:38:17 node2 cluster-dlm[4344]: do_sysfs: write "1" to > "/sys/kernel/dlm/12925001B70F4A529491DFF16F8B8352/control" > Jul 5 01:38:17 node2 kernel: [ 726.576130] > (ocfs2rec,6152,0):ocfs2_replay_journal:1549 Recovering node 739307907 > from slot 0 on device (253,0) > Jul 5 01:38:18 node2 kernel: [ 727.453597] > (ocfs2rec,6152,0):ocfs2_begin_quota_recovery:407 Beginning quota > recovery in slot 0 > Jul 5 01:38:18 node2 kernel: [ 727.455965] > (kworker/u:28,139,0):ocfs2_finish_quota_recovery:599 Finishing quota > recovery in slot 0 > > > cib: > primitive p-dlm ocf:pacemaker:controld \ > op monitor interval="120" timeout="30" \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" > primitive p-drbd ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="50" role="Master" timeout="30" \ > op monitor interval="60" role="Slave" timeout="30" \ > op start interval="0" timeout="240" \ > op stop interval="0" timeout="100" > primitive p-o2cb ocf:ocfs2:o2cb \ > op monitor interval="120" timeout="30" \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" > primitive p-sharefs ocf:heartbeat:Filesystem \ > params device="/dev/vg1/sharelv" directory="/sharedata" > fstype="ocfs2" \ > op monitor interval="60s" timeout="60s" \ > op start interval="0" timeout="90s" \ > op stop interval="0" timeout="90s" > primitive p-vg1 ocf:heartbeat:LVM \ > params volgrpname="vg1" \ > op monitor interval="60s" timeout="40" \ > op start interval="0" timeout="40" \ > op stop interval="0" timeout="40" > primitive sbd_stonith stonith:external/sbd \ > meta target-role="Started" \ > op monitor interval="3000" timeout="20" \ > op start interval="0" timeout="20" \ > op stop interval="0" timeout="20" \ > params > sbd_device="/dev/disk/by-id/scsi-1494554000000000035c71550e83586b70dfc77c3b382 > dfbd" > group g-lock p-dlm p-o2cb > group g-sharedata p-vg1 p-sharefs > ms ms-drbd p-drbd \ > meta master-max="2" clone-max="2" notify="true" interleave="true" > clone cl-lock g-lock \ > meta globally-unique="false" interleave="true" > clone cl-sharedata g-sharedata \ > meta globally-unique="false" interleave="true" > colocation col-drbd-lock inf: cl-lock ms-drbd:Master > colocation col-lock-sharedata inf: cl-sharedata cl-lock > order ord-drbd-lock inf: ms-drbd:promote cl-lock > order ord-lock-sharedata inf: cl-lock cl-sharedata > stonith-timeout="90s" \ > no-quorum-policy="ignore" \ > expected-quorum-votes="2" > > > -- > Regards, > > Muhammad Sharfuddin > _______________________________________________ > Linux-HA mailing list is closing down. > Please subscribe to us...@clusterlabs.org instead. > http://clusterlabs.org/mailman/listinfo/users > _______________________________________________ > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha _______________________________________________ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users _______________________________________________ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha