[ClusterLabs] FYI: regression using 2.0.0 / 1.1.19 Pacemaker Remote node with older cluster nodes
Hi all, The just-released Pacemaker 2.0.0 and 1.1.19 releases have an issue when a Pacemaker Remote node is upgraded before the cluster nodes. Pacemaker 2.0.0 contains a fix (also backported to 1.1.19) for the longstanding issue of "crm_node -n" getting the wrong name when run on the command line of a Pacemaker Remote node whose node name is different from its local hostname. However, the fix can cause resource agents running on a Pacemaker Remote node to hang when used with a cluster node older than 2.0.0 / 1.1.19. The only workaround is to upgrade all cluster nodes before upgrading any Pacemaker Remote nodes (which is the recommended practice anyway). -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Antwort: Antw: Antw: Antwort: Antw: corosync/dlm fencing?
>>> Philipp Achmüller schrieb am 16.07.2018 um 14:09 in Nachricht : > Hi, > >> Von: "Ulrich Windl" >> An: >> Datum: 16.07.2018 13:46 >> Betreff: [ClusterLabs] Antw: Antw: Antwort: Antw: corosync/dlm fencing? >> Gesendet von: "Users" >> >> Hi again! >> >> Oh, I missed "...maintenance i would like to standby 1 or 2 nodes from >> "sitea""... >> > > Yes - sorry for tons of logs - but i think this will catch the whole > situation... > >> I think some time ago I had asked about the same thing for SLES11, and > the >> answer was that with some configurations a standby is not possible (I > thought >> it was related to OCFS2, but mybe it was cLVM or DLM). Despite of quorum >> issues, why not shutdown the node completely? > > i also tried "systemctl stop pacemaker" or "normal" system shutdown -> > same outcome, there is always dlm fencing some other nodes... Hmm... I looked up my last node shutdown on the three-node cluster. Essential messages are like this (many messages left out, sometimes indicated by "(more)"): attrd[13169]: notice: attrd_trigger_update: Sending flush op to all hosts for: shutdown (0) pengine[13170]: notice: stage6: Scheduling Node h10 for shutdown pengine[13170]: notice: LogActions: Stopprm_DLM:1(h10) pengine[13170]: notice: LogActions: Stopprm_cLVMd:1 (h10) pengine[13170]: notice: LogActions: Stopprm_LVM_CFS_VMs:1(h10) (more) crmd[13171]: notice: te_rsc_command: Initiating action 97: stop prm_CFS_VMs_fs_stop_0 on h10 cluster-dlm[13901]: log_config: dlm:ls:490B9FCAFA3D4B2F9A586A5893E00730 conf 2 0 1 memb 739512321 739512325 join left 739512330 cluster-dlm[13901]: add_change: 490B9FCAFA3D4B2F9A586A5893E00730 add_change cg 8 remove nodeid 739512330 reason 2 (more) cluster-dlm[13901]: receive_plocks_stored: 490B9FCAFA3D4B2F9A586A5893E00730 receive_plocks_stored 739512321:8 flags a sig 0 need_plocks 0 ocfs2_controld[14389]: confchg called ocfs2_controld[14389]: node daemon left 739512330 cluster-dlm[13901]: log_config: dlm:ls:clvmd conf 2 0 1 memb 739512321 739512325 join left 739512330 cluster-dlm[13901]: stop_kernel: clvmd stop_kernel cg 8 cluster-dlm[13901]: log_config: dlm:controld conf 2 0 1 memb 739512321 739512325 join left 739512330 crmd[13171]: notice: run_graph: Transition 10 (Complete=41, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-26.bz2): Complete cib[13166]: notice: crm_update_peer_state: cib_peer_update_callback: Node h10[739512330] - state is now lost (was member) corosync[13162]: [CLM ] Members Left: corosync[13162]: [CLM ] r(0) ip(172.20.16.10) r(1) ip(10.2.2.10) stonith-ng[13167]: notice: crm_update_peer_state: st_peer_update_callback: Node h10[739512330] - state is now lost (was member) cluster-dlm[13901]: dlm_process_node: Removed inactive node 739512330: born-on=3804, last-seen=3804, this-event=3808, last-event=3804 kernel: [349683.507286] dlm: closing connection to node 739512330 crmd[13171]: notice: peer_update_callback: do_shutdown of h10 (op 190) is complete corosync[13162]: [MAIN ] Completed service synchronization, ready to provide service. crmd[13171]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Another random thought: How do you collect your logs? The node being fenced may not have the last few log messages written to disk (SLES11 syslog at least). Sometimes it's a good idea to "tail -f" the syslogs on each node, so if the node goes down, you see what is expected to be in the syslog at least. Chances may be better if you have a remote syslog server via DGRAM sockets... Regards, Ulrich > >> >> Regards, >> Ulrich >> >> >>> "Ulrich Windl" schrieb am > 16.07.2018 >> um >> 13:35 in Nachricht <5b4c82f002a10002c...@gwsmtp1.uni-regensburg.de>: >> Philipp Achmüller schrieb am 16.07.2018 > um >> > 11:44 in >> > Nachricht >> : >> >> hi! >> >> >> >> Thank you for comment. >> >> Unfortunatly it is not obvious for me - the "grep fence" is attached > in my >> >> >> original message. >> > >> > Hi! >> > >> > OK, seems I missed finding the needle in all the hay... >> > Anyway I think the problem is "Cluster node siteb-1 will be fenced: > peer is >> >> > no >> > longer part of the cluster". Looks as if the cluster noticed an > unclean >> > shutdown of node siteb-1. The message "Stonith/shutdown of siteb-1 not > >> > matched" >> > seems to confirm that. >> > When shutting down two nodes, did you wait until shutdown of the first > node >> > succeeded before shutting down the second? >> > >> > Regards, >> > Ulrich >> > >> >> >> >> i searched older logs with activated dubug information for dlm: - > this is >> >> the sequence from syslog (from another timeframe): >> >> >> >> --- >> >> Node: siteb-2 (DC): >> >> >> >> 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]: notice: > State >> >> transition S
[ClusterLabs] Antwort: Antw: Antw: Antwort: Antw: corosync/dlm fencing?
Hi, > Von: "Ulrich Windl" > An: > Datum: 16.07.2018 13:46 > Betreff: [ClusterLabs] Antw: Antw: Antwort: Antw: corosync/dlm fencing? > Gesendet von: "Users" > > Hi again! > > Oh, I missed "...maintenance i would like to standby 1 or 2 nodes from > "sitea""... > Yes - sorry for tons of logs - but i think this will catch the whole situation... > I think some time ago I had asked about the same thing for SLES11, and the > answer was that with some configurations a standby is not possible (I thought > it was related to OCFS2, but mybe it was cLVM or DLM). Despite of quorum > issues, why not shutdown the node completely? i also tried "systemctl stop pacemaker" or "normal" system shutdown -> same outcome, there is always dlm fencing some other nodes... > > Regards, > Ulrich > > >>> "Ulrich Windl" schrieb am 16.07.2018 > um > 13:35 in Nachricht <5b4c82f002a10002c...@gwsmtp1.uni-regensburg.de>: > Philipp Achmüller schrieb am 16.07.2018 um > > 11:44 in > > Nachricht > : > >> hi! > >> > >> Thank you for comment. > >> Unfortunatly it is not obvious for me - the "grep fence" is attached in my > > >> original message. > > > > Hi! > > > > OK, seems I missed finding the needle in all the hay... > > Anyway I think the problem is "Cluster node siteb-1 will be fenced: peer is > > > no > > longer part of the cluster". Looks as if the cluster noticed an unclean > > shutdown of node siteb-1. The message "Stonith/shutdown of siteb-1 not > > matched" > > seems to confirm that. > > When shutting down two nodes, did you wait until shutdown of the first node > > succeeded before shutting down the second? > > > > Regards, > > Ulrich > > > >> > >> i searched older logs with activated dubug information for dlm: - this is > >> the sequence from syslog (from another timeframe): > >> > >> --- > >> Node: siteb-2 (DC): > >> > >> 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]: notice: State > >> transition S_IDLE -> S_POLICY_ENGINE > >> 2018-06-28T09:02:23.279028+02:00 siteb-2 pengine[189259]: notice: > >> Watchdog will be used via SBD if fencing is required > >> 2018-06-28T09:02:23.279214+02:00 siteb-2 pengine[189259]: notice: On > >> loss of CCM Quorum: Ignore > >> 2018-06-28T09:02:23.282153+02:00 siteb-2 pengine[189259]: notice: Move > >> stonith-sbd#011(Started sitea-1 -> siteb-1) > >> 2018-06-28T09:02:23.282249+02:00 siteb-2 pengine[189259]: notice: Move > >> cl-info#011(Started sitea-1 -> siteb-2) > >> 2018-06-28T09:02:23.282338+02:00 siteb-2 pengine[189259]: notice: Move > >> k45RG#011(Started sitea-2 -> siteb-1) > >> 2018-06-28T09:02:23.282422+02:00 siteb-2 pengine[189259]: notice: Stop > >> dlm:0#011(sitea-1) > >> 2018-06-28T09:02:23.282505+02:00 siteb-2 pengine[189259]: notice: Stop > >> clvm:0#011(sitea-1) > >> 2018-06-28T09:02:23.282588+02:00 siteb-2 pengine[189259]: notice: Stop > >> vg1:0#011(sitea-1) > >> 2018-06-28T09:02:23.282670+02:00 siteb-2 pengine[189259]: notice: Stop > >> dlm:3#011(sitea-2) > >> 2018-06-28T09:02:23.282752+02:00 siteb-2 pengine[189259]: notice: Stop > >> clvm:3#011(sitea-2) > >> 2018-06-28T09:02:23.282833+02:00 siteb-2 pengine[189259]: notice: Stop > >> vg1:3#011(sitea-2) > >> 2018-06-28T09:02:23.282916+02:00 siteb-2 pengine[189259]: notice: Stop > >> sysinfo:0#011(sitea-1) > >> 2018-06-28T09:02:23.283001+02:00 siteb-2 pengine[189259]: notice: Stop > >> sysinfo:3#011(sitea-2) > >> 2018-06-28T09:02:23.283978+02:00 siteb-2 pengine[189259]: notice: > >> Calculated transition 1056, saving inputs in > >> /var/lib/pacemaker/pengine/pe-input-2321.bz2 > >> 2018-06-28T09:02:23.284428+02:00 siteb-2 crmd[189260]: notice: > >> Processing graph 1056 (ref=pe_calc-dc-1530169343-1339) derived from > >> /var/lib/pacemaker/pengine/pe-input-2321.bz2 > >> 2018-06-28T09:02:23.284575+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation stonith-sbd_stop_0 on sitea-1 > >> 2018-06-28T09:02:23.284659+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation cl-info_stop_0 on sitea-1 > >> 2018-06-28T09:02:23.284742+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation k45RG_stop_0 on sitea-2 > >> 2018-06-28T09:02:23.284824+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation vg1_stop_0 on sitea-1 > >> 2018-06-28T09:02:23.284908+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation vg1_stop_0 on sitea-2 > >> 2018-06-28T09:02:23.284990+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation sysinfo_stop_0 on sitea-1 > >> 2018-06-28T09:02:23.285072+02:00 siteb-2 crmd[189260]: notice: > >> Initiating stop operation sysinfo_stop_0 on sitea-2 > >> 2018-06-28T09:02:23.288254+02:00 siteb-2 crmd[189260]: notice: > >> Initiating start operation stonith-sbd_start_0 on siteb-1 > >> 2018-06-28T09:02:23.298867+02:00 siteb-2 crmd[189260]: notice: > >> Initiating start operation cl-info_start_0 locally
[ClusterLabs] Antw: Antw: Antwort: Antw: corosync/dlm fencing?
Hi again! Oh, I missed "...maintenance i would like to standby 1 or 2 nodes from "sitea""... I think some time ago I had asked about the same thing for SLES11, and the answer was that with some configurations a standby is not possible (I thought it was related to OCFS2, but mybe it was cLVM or DLM). Despite of quorum issues, why not shutdown the node completely? Regards, Ulrich >>> "Ulrich Windl" schrieb am 16.07.2018 um 13:35 in Nachricht <5b4c82f002a10002c...@gwsmtp1.uni-regensburg.de>: Philipp Achmüller schrieb am 16.07.2018 um > 11:44 in > Nachricht : >> hi! >> >> Thank you for comment. >> Unfortunatly it is not obvious for me - the "grep fence" is attached in my >> original message. > > Hi! > > OK, seems I missed finding the needle in all the hay... > Anyway I think the problem is "Cluster node siteb-1 will be fenced: peer is > no > longer part of the cluster". Looks as if the cluster noticed an unclean > shutdown of node siteb-1. The message "Stonith/shutdown of siteb-1 not > matched" > seems to confirm that. > When shutting down two nodes, did you wait until shutdown of the first node > succeeded before shutting down the second? > > Regards, > Ulrich > >> >> i searched older logs with activated dubug information for dlm: - this is >> the sequence from syslog (from another timeframe): >> >> --- >> Node: siteb-2 (DC): >> >> 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]: notice: State >> transition S_IDLE -> S_POLICY_ENGINE >> 2018-06-28T09:02:23.279028+02:00 siteb-2 pengine[189259]: notice: >> Watchdog will be used via SBD if fencing is required >> 2018-06-28T09:02:23.279214+02:00 siteb-2 pengine[189259]: notice: On >> loss of CCM Quorum: Ignore >> 2018-06-28T09:02:23.282153+02:00 siteb-2 pengine[189259]: notice: Move >> stonith-sbd#011(Started sitea-1 -> siteb-1) >> 2018-06-28T09:02:23.282249+02:00 siteb-2 pengine[189259]: notice: Move >> cl-info#011(Started sitea-1 -> siteb-2) >> 2018-06-28T09:02:23.282338+02:00 siteb-2 pengine[189259]: notice: Move >> k45RG#011(Started sitea-2 -> siteb-1) >> 2018-06-28T09:02:23.282422+02:00 siteb-2 pengine[189259]: notice: Stop >> dlm:0#011(sitea-1) >> 2018-06-28T09:02:23.282505+02:00 siteb-2 pengine[189259]: notice: Stop >> clvm:0#011(sitea-1) >> 2018-06-28T09:02:23.282588+02:00 siteb-2 pengine[189259]: notice: Stop >> vg1:0#011(sitea-1) >> 2018-06-28T09:02:23.282670+02:00 siteb-2 pengine[189259]: notice: Stop >> dlm:3#011(sitea-2) >> 2018-06-28T09:02:23.282752+02:00 siteb-2 pengine[189259]: notice: Stop >> clvm:3#011(sitea-2) >> 2018-06-28T09:02:23.282833+02:00 siteb-2 pengine[189259]: notice: Stop >> vg1:3#011(sitea-2) >> 2018-06-28T09:02:23.282916+02:00 siteb-2 pengine[189259]: notice: Stop >> sysinfo:0#011(sitea-1) >> 2018-06-28T09:02:23.283001+02:00 siteb-2 pengine[189259]: notice: Stop >> sysinfo:3#011(sitea-2) >> 2018-06-28T09:02:23.283978+02:00 siteb-2 pengine[189259]: notice: >> Calculated transition 1056, saving inputs in >> /var/lib/pacemaker/pengine/pe-input-2321.bz2 >> 2018-06-28T09:02:23.284428+02:00 siteb-2 crmd[189260]: notice: >> Processing graph 1056 (ref=pe_calc-dc-1530169343-1339) derived from >> /var/lib/pacemaker/pengine/pe-input-2321.bz2 >> 2018-06-28T09:02:23.284575+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation stonith-sbd_stop_0 on sitea-1 >> 2018-06-28T09:02:23.284659+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation cl-info_stop_0 on sitea-1 >> 2018-06-28T09:02:23.284742+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation k45RG_stop_0 on sitea-2 >> 2018-06-28T09:02:23.284824+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation vg1_stop_0 on sitea-1 >> 2018-06-28T09:02:23.284908+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation vg1_stop_0 on sitea-2 >> 2018-06-28T09:02:23.284990+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation sysinfo_stop_0 on sitea-1 >> 2018-06-28T09:02:23.285072+02:00 siteb-2 crmd[189260]: notice: >> Initiating stop operation sysinfo_stop_0 on sitea-2 >> 2018-06-28T09:02:23.288254+02:00 siteb-2 crmd[189260]: notice: >> Initiating start operation stonith-sbd_start_0 on siteb-1 >> 2018-06-28T09:02:23.298867+02:00 siteb-2 crmd[189260]: notice: >> Initiating start operation cl-info_start_0 locally on siteb-2 >> 2018-06-28T09:02:23.309272+02:00 siteb-2 lrmd[189257]: notice: executing >> - rsc:cl-info action:start call_id:105 >> 2018-06-28T09:02:23.384074+02:00 siteb-2 lrmd[189257]: notice: finished >> - rsc:cl-info action:start call_id:105 pid:253747 exit-code:0 >> exec-time:75ms queue-time:0ms >> 2018-06-28T09:02:23.393759+02:00 siteb-2 crmd[189260]: notice: Result of >> start operation for cl-info on siteb-2: 0 (ok) >> 2018-06-28T09:02:23.395594+02:00 siteb-2 crmd[189260]: notice: >> Initiating monitor operation cl-info_monitor_6 locally on siteb-2 >> 2018-06-28T09:02:
[ClusterLabs] Antw: Antwort: Antw: corosync/dlm fencing?
>>> Philipp Achmüller schrieb am 16.07.2018 um 11:44 in Nachricht : > hi! > > Thank you for comment. > Unfortunatly it is not obvious for me - the "grep fence" is attached in my > original message. Hi! OK, seems I missed finding the needle in all the hay... Anyway I think the problem is "Cluster node siteb-1 will be fenced: peer is no longer part of the cluster". Looks as if the cluster noticed an unclean shutdown of node siteb-1. The message "Stonith/shutdown of siteb-1 not matched" seems to confirm that. When shutting down two nodes, did you wait until shutdown of the first node succeeded before shutting down the second? Regards, Ulrich > > i searched older logs with activated dubug information for dlm: - this is > the sequence from syslog (from another timeframe): > > --- > Node: siteb-2 (DC): > > 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]: notice: State > transition S_IDLE -> S_POLICY_ENGINE > 2018-06-28T09:02:23.279028+02:00 siteb-2 pengine[189259]: notice: > Watchdog will be used via SBD if fencing is required > 2018-06-28T09:02:23.279214+02:00 siteb-2 pengine[189259]: notice: On > loss of CCM Quorum: Ignore > 2018-06-28T09:02:23.282153+02:00 siteb-2 pengine[189259]: notice: Move > stonith-sbd#011(Started sitea-1 -> siteb-1) > 2018-06-28T09:02:23.282249+02:00 siteb-2 pengine[189259]: notice: Move > cl-info#011(Started sitea-1 -> siteb-2) > 2018-06-28T09:02:23.282338+02:00 siteb-2 pengine[189259]: notice: Move > k45RG#011(Started sitea-2 -> siteb-1) > 2018-06-28T09:02:23.282422+02:00 siteb-2 pengine[189259]: notice: Stop > dlm:0#011(sitea-1) > 2018-06-28T09:02:23.282505+02:00 siteb-2 pengine[189259]: notice: Stop > clvm:0#011(sitea-1) > 2018-06-28T09:02:23.282588+02:00 siteb-2 pengine[189259]: notice: Stop > vg1:0#011(sitea-1) > 2018-06-28T09:02:23.282670+02:00 siteb-2 pengine[189259]: notice: Stop > dlm:3#011(sitea-2) > 2018-06-28T09:02:23.282752+02:00 siteb-2 pengine[189259]: notice: Stop > clvm:3#011(sitea-2) > 2018-06-28T09:02:23.282833+02:00 siteb-2 pengine[189259]: notice: Stop > vg1:3#011(sitea-2) > 2018-06-28T09:02:23.282916+02:00 siteb-2 pengine[189259]: notice: Stop > sysinfo:0#011(sitea-1) > 2018-06-28T09:02:23.283001+02:00 siteb-2 pengine[189259]: notice: Stop > sysinfo:3#011(sitea-2) > 2018-06-28T09:02:23.283978+02:00 siteb-2 pengine[189259]: notice: > Calculated transition 1056, saving inputs in > /var/lib/pacemaker/pengine/pe-input-2321.bz2 > 2018-06-28T09:02:23.284428+02:00 siteb-2 crmd[189260]: notice: > Processing graph 1056 (ref=pe_calc-dc-1530169343-1339) derived from > /var/lib/pacemaker/pengine/pe-input-2321.bz2 > 2018-06-28T09:02:23.284575+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation stonith-sbd_stop_0 on sitea-1 > 2018-06-28T09:02:23.284659+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation cl-info_stop_0 on sitea-1 > 2018-06-28T09:02:23.284742+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation k45RG_stop_0 on sitea-2 > 2018-06-28T09:02:23.284824+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation vg1_stop_0 on sitea-1 > 2018-06-28T09:02:23.284908+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation vg1_stop_0 on sitea-2 > 2018-06-28T09:02:23.284990+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation sysinfo_stop_0 on sitea-1 > 2018-06-28T09:02:23.285072+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation sysinfo_stop_0 on sitea-2 > 2018-06-28T09:02:23.288254+02:00 siteb-2 crmd[189260]: notice: > Initiating start operation stonith-sbd_start_0 on siteb-1 > 2018-06-28T09:02:23.298867+02:00 siteb-2 crmd[189260]: notice: > Initiating start operation cl-info_start_0 locally on siteb-2 > 2018-06-28T09:02:23.309272+02:00 siteb-2 lrmd[189257]: notice: executing > - rsc:cl-info action:start call_id:105 > 2018-06-28T09:02:23.384074+02:00 siteb-2 lrmd[189257]: notice: finished > - rsc:cl-info action:start call_id:105 pid:253747 exit-code:0 > exec-time:75ms queue-time:0ms > 2018-06-28T09:02:23.393759+02:00 siteb-2 crmd[189260]: notice: Result of > start operation for cl-info on siteb-2: 0 (ok) > 2018-06-28T09:02:23.395594+02:00 siteb-2 crmd[189260]: notice: > Initiating monitor operation cl-info_monitor_6 locally on siteb-2 > 2018-06-28T09:02:24.159586+02:00 siteb-2 crmd[189260]: notice: > Initiating stop operation clvm_stop_0 on sitea-2 > 2018-06-28T09:02:24.193317+02:00 siteb-2 kernel: [80844.122213] dlm: > clvmd: dlm_recover 5 > 2018-06-28T09:02:24.193349+02:00 siteb-2 kernel: [80844.122240] dlm: > clvmd: dlm_clear_toss 1 done > 2018-06-28T09:02:24.193351+02:00 siteb-2 kernel: [80844.122251] dlm: > clvmd: remove member 3 > 2018-06-28T09:02:24.193352+02:00 siteb-2 kernel: [80844.122579] dlm: > clvmd: dlm_recover_members 3 nodes > 2018-06-28T09:02:24.993269+02:00 siteb-2 kernel: [80844.920751] dlm: > clvmd: generation 7 slots 3 1:2 2:4 3:
[ClusterLabs] Antwort: Antw: corosync/dlm fencing?
hi! Thank you for comment. Unfortunatly it is not obvious for me - the "grep fence" is attached in my original message. i searched older logs with activated dubug information for dlm: - this is the sequence from syslog (from another timeframe): --- Node: siteb-2 (DC): 2018-06-28T09:02:23.272415+02:00 siteb-2 crmd[189260]: notice: State transition S_IDLE -> S_POLICY_ENGINE 2018-06-28T09:02:23.279028+02:00 siteb-2 pengine[189259]: notice: Watchdog will be used via SBD if fencing is required 2018-06-28T09:02:23.279214+02:00 siteb-2 pengine[189259]: notice: On loss of CCM Quorum: Ignore 2018-06-28T09:02:23.282153+02:00 siteb-2 pengine[189259]: notice: Move stonith-sbd#011(Started sitea-1 -> siteb-1) 2018-06-28T09:02:23.282249+02:00 siteb-2 pengine[189259]: notice: Move cl-info#011(Started sitea-1 -> siteb-2) 2018-06-28T09:02:23.282338+02:00 siteb-2 pengine[189259]: notice: Move k45RG#011(Started sitea-2 -> siteb-1) 2018-06-28T09:02:23.282422+02:00 siteb-2 pengine[189259]: notice: Stop dlm:0#011(sitea-1) 2018-06-28T09:02:23.282505+02:00 siteb-2 pengine[189259]: notice: Stop clvm:0#011(sitea-1) 2018-06-28T09:02:23.282588+02:00 siteb-2 pengine[189259]: notice: Stop vg1:0#011(sitea-1) 2018-06-28T09:02:23.282670+02:00 siteb-2 pengine[189259]: notice: Stop dlm:3#011(sitea-2) 2018-06-28T09:02:23.282752+02:00 siteb-2 pengine[189259]: notice: Stop clvm:3#011(sitea-2) 2018-06-28T09:02:23.282833+02:00 siteb-2 pengine[189259]: notice: Stop vg1:3#011(sitea-2) 2018-06-28T09:02:23.282916+02:00 siteb-2 pengine[189259]: notice: Stop sysinfo:0#011(sitea-1) 2018-06-28T09:02:23.283001+02:00 siteb-2 pengine[189259]: notice: Stop sysinfo:3#011(sitea-2) 2018-06-28T09:02:23.283978+02:00 siteb-2 pengine[189259]: notice: Calculated transition 1056, saving inputs in /var/lib/pacemaker/pengine/pe-input-2321.bz2 2018-06-28T09:02:23.284428+02:00 siteb-2 crmd[189260]: notice: Processing graph 1056 (ref=pe_calc-dc-1530169343-1339) derived from /var/lib/pacemaker/pengine/pe-input-2321.bz2 2018-06-28T09:02:23.284575+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation stonith-sbd_stop_0 on sitea-1 2018-06-28T09:02:23.284659+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation cl-info_stop_0 on sitea-1 2018-06-28T09:02:23.284742+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation k45RG_stop_0 on sitea-2 2018-06-28T09:02:23.284824+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation vg1_stop_0 on sitea-1 2018-06-28T09:02:23.284908+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation vg1_stop_0 on sitea-2 2018-06-28T09:02:23.284990+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation sysinfo_stop_0 on sitea-1 2018-06-28T09:02:23.285072+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation sysinfo_stop_0 on sitea-2 2018-06-28T09:02:23.288254+02:00 siteb-2 crmd[189260]: notice: Initiating start operation stonith-sbd_start_0 on siteb-1 2018-06-28T09:02:23.298867+02:00 siteb-2 crmd[189260]: notice: Initiating start operation cl-info_start_0 locally on siteb-2 2018-06-28T09:02:23.309272+02:00 siteb-2 lrmd[189257]: notice: executing - rsc:cl-info action:start call_id:105 2018-06-28T09:02:23.384074+02:00 siteb-2 lrmd[189257]: notice: finished - rsc:cl-info action:start call_id:105 pid:253747 exit-code:0 exec-time:75ms queue-time:0ms 2018-06-28T09:02:23.393759+02:00 siteb-2 crmd[189260]: notice: Result of start operation for cl-info on siteb-2: 0 (ok) 2018-06-28T09:02:23.395594+02:00 siteb-2 crmd[189260]: notice: Initiating monitor operation cl-info_monitor_6 locally on siteb-2 2018-06-28T09:02:24.159586+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation clvm_stop_0 on sitea-2 2018-06-28T09:02:24.193317+02:00 siteb-2 kernel: [80844.122213] dlm: clvmd: dlm_recover 5 2018-06-28T09:02:24.193349+02:00 siteb-2 kernel: [80844.122240] dlm: clvmd: dlm_clear_toss 1 done 2018-06-28T09:02:24.193351+02:00 siteb-2 kernel: [80844.122251] dlm: clvmd: remove member 3 2018-06-28T09:02:24.193352+02:00 siteb-2 kernel: [80844.122579] dlm: clvmd: dlm_recover_members 3 nodes 2018-06-28T09:02:24.993269+02:00 siteb-2 kernel: [80844.920751] dlm: clvmd: generation 7 slots 3 1:2 2:4 3:1 2018-06-28T09:02:24.993283+02:00 siteb-2 kernel: [80844.920755] dlm: clvmd: dlm_recover_directory 2018-06-28T09:02:25.189265+02:00 siteb-2 kernel: [80845.118173] dlm: clvmd: dlm_recover_directory 1 in 0 new 2018-06-28T09:02:25.604661+02:00 siteb-2 crmd[189260]: notice: Initiating stop operation clvm_stop_0 on sitea-1 2018-06-28T09:02:26.189277+02:00 siteb-2 kernel: [80846.116630] dlm: clvmd: dlm_recover_directory 0 out 2 messages 2018-06-28T09:02:26.189291+02:00 siteb-2 kernel: [80846.116635] dlm: clvmd: dlm_recover_masters 2018-06-28T09:02:26.189292+02:00 siteb-2 kernel: [80846.116648] dlm: clvmd: dlm_recover_masters 0 of 2 2018-06-28T09:02:26.189293+02:00 siteb-2 kernel: