Re: [ClusterLabs] Weird Fencing Behavior
t; to pcmk_reboot_action='off' >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to >> fencing device >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing device >> ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> >> >> >> >> >> >>>> See my config below: >>>> >>>> [root@ArcosRhel2 cluster]# pcs config >>>> Cluster Name: ARCOSCLUSTER >>>> Corosync Nodes: >>>> ? ArcosRhel1 ArcosRhel2 >>>> Pacemaker Nodes: >>>> ? ArcosRhel1 ArcosRhel2 >>>> >>>> Resources: >>>> ? Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >>>> ? ?Attributes: cidr_netmask=32 ip=172.16.10.243 >>>> ? ?Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) >>>> ? ? ? ? ? ? ? ?start interval=0s timeout=20s (ClusterIP-start- >>> interval-0s) >>>> ? ? ? ? ? ? ? ?stop interval=0s timeout=20s (ClusterIP-stop- >>> interval-0s) >>>> Stonith Devices: >>>> ? Resource: Fence1 (class=stonith type=fence_vmware_soap) >>>> ? ?Attributes: action=off ipaddr=172.16.10.151 login=admin >>> passwd=123pass >>>> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s >>> port=ArcosRhel1(Joniel) >>>> ssl_insecure=1 pcmk_delay_max=0s >>>> ? ?Operations: monitor interval=60s (Fence1-monitor-interval-60s) >>>> ? Resource: fence2 (class=stonith type=fence_vmware_soap) >>>> ? ?Attributes: action=off ipaddr=172.16.10.152 login=admin >>> passwd=123pass >>>> pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 >>> pcmk_monitor_timeout=60s >>>> port=ArcosRhel2(Ben) ssl_insecure=1 >>>> ? ?Operations: monitor interval=60s (fence2-monitor-interval-60s) >>>> Fencing Levels: >>>> >>>> Location Constraints: >>>> ? ?Resource: Fence1 >>>> ? ? ?Enabled on: ArcosRhel2 (score:INFINITY) >>>> (id:location-Fence1-ArcosRhel2-INFINITY) >>>> ? ?Resource: fence2 >>>> ? ? ?Enabled on: ArcosRhel1 (score:INFINITY) >>>> (id:location-fence2-ArcosRhel1-INFINITY) >>>> Ordering Constraints: >>>> Colocation Constraints: >>>> Ticket Constraints: >>>> >>>> Alerts: >>>> ? No alerts defined >>>> >>>> Resources Defaults: >>>> ? No defaults set >>>> Operations Defaults: >>>> ? No defaults set >>>> >>>> Cluster Properties: >>>> ? cluster-infrastructure: corosync >>>> ? cluster-name: ARCOSCLUSTER >>>> ? dc-version: 1.1.16-12.el7-94ff4df >>>> ? have-watchdog: false >>>> ? last-lrm-refresh: 1531810841 >>>> ? stonith-enabled: true >>>> >>>> Quorum: >>>> ? ?Options: On Wed, Jul 18, 2018 at 8:00 PM, wrote: > Send Users mailing list submissions to > users@clusterlabs.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.clusterlabs.org/mailman/listinfo/users > or, via email, send a message with subject or body 'help' to > users-requ...@clusterlabs.org > > You can reach the person managing the list at > users-ow...@clusterlabs.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Users digest..." > > > Today's Topics: > >1. Re: Weird Fencing Behavior (Andrei Borzenkov) >2. Re: Weird Fencing Behavior (Klaus Wenninger) > > > -- > > Message: 1 > Date: Wed, 18 Jul 2018 07:22:25 +0300 > From: Andrei Borzenkov > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] Weird Fencing Behavior > Message-ID: > Content-Type: text/plain; charset=utf-8 > > 18.07.2018 04:21, Confidenti
Re: [ClusterLabs] Weird Fencing Behavior
On 07/18/2018 06:22 AM, Andrei Borzenkov wrote: > 18.07.2018 04:21, Confidential Company пишет: Hi, On my two-node active/passive setup, I configured fencing via fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I >>> expected that both nodes will be stonithed simultaenously. On my test scenario, Node1 has ClusterIP resource. When I >>> disconnect service/corosync link physically, Node1 was fenced and Node2 keeps >>> alive given pcmk_delay=0 on both nodes. Can you explain the behavior above? >>> #node1 could not connect to ESX because links were disconnected. As >>> the >>> #most obvious explanation. >>> >>> #You have logs, you are the only one who can answer this question >>> with >>> #some certainty. Others can only guess. >>> >>> >>> Oops, my bad. I forgot to tell. I have two interfaces on each virtual >>> machine (nodes). second interface was used for ESX links, so fence >>> can be executed even though corosync links were disconnected. Looking >>> forward to your response. Thanks >> #Having no fence delay means a death match (each node killing the other) >> #is possible, but it doesn't guarantee that it will happen. Some of the >> #time, one node will detect the outage and fence the other one before >> #the other one can react. >> >> #It's basically an Old West shoot-out -- they may reach for their guns >> #at the same time, but one may be quicker. >> >> #As Andrei suggested, the logs from both nodes could give you a timeline >> #of what happened when. >> >> >> Hi andrei, kindly see below logs. Based on time of logs, Node1 should have >> fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown >> by Node2. >> > Node1 tried to fence but failed. It could be connectivity, it could be > credentials. > >> Is it possible to have a 2-Node active/passive setup in pacemaker/corosync >> that the node that gets disconnected/interface down is the only one that >> gets fenced? >> > If you could determine which node was disconnected you would not need > any fencing at all. True but there is still good reason taking connection into account. Of course the foreseen survivor can't know that his peer got disconnected directly. But what you can do is that if you see that you are disconnected yourself (e.g. ping-connection to routers, test-access to some web-servers, ...) you can decide to shoot with a delay or not shoot at all because starting services locally would anyway be no good. That is the basic idea behind fence_heuristics_ping fence-agent. There was some discussion just recently about approaches like that on the list. Regards, Klaus >> Thanks guys >> >> *LOGS from Node2:* >> >> Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed, >> forming new configuration. > ... >> Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be >> fenced because the node is no longer part of the cluster > ... >> Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation 'reboot' >> [2323] (call 2 from crmd.1084) for host 'ArcosRhel1' with device 'Fence1' >> returned: 0 (OK) >> Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation reboot of >> ArcosRhel1 by ArcosRhel2 for crmd.1084@ArcosRhel2.0426e6e1: OK >> Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Stonith operation >> 2/12:0:0:f9418e1f-1f13-4033-9eaa-aec705f807ef: OK (0) >> Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Peer ArcosRhel1 was >> terminated (reboot) by ArcosRhel2 for ArcosRhel2: OK > ... >> >> >> *LOGS from NODE1* >> Jul 17 13:33:26 ArcoSRhel1 corosync[1464]: [TOTEM ] A processor failed, >> forming new configuration >> Jul 17 13:33:28 ArcoSRhel1 pengine[1476]: warning: Node ArcosRhel2 will be >> fenced because the node is no longer part of the cluster > ... >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: warning: Mapping action='off' >> to pcmk_reboot_action='off' >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence >> (reboot) ArcosRhel2: static-list >> Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to >> fencing device >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing device >> ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: >> fence_vmware_soap[7157] stderr: [ ] >> >> >> >> >> >> See my config below: [root@ArcosRhel2 cluster]# pcs config Cluster Name: ARCOSCLUSTER Corosync Nodes: ? ArcosRhel1 ArcosRhel2 Pacemaker
Re: [ClusterLabs] Weird Fencing Behavior
18.07.2018 04:21, Confidential Company пишет: >>> Hi, >>> >>> On my two-node active/passive setup, I configured fencing via >>> fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I >> expected >>> that both nodes will be stonithed simultaenously. >>> >>> On my test scenario, Node1 has ClusterIP resource. When I >> disconnect >>> service/corosync link physically, Node1 was fenced and Node2 keeps >> alive >>> given pcmk_delay=0 on both nodes. >>> >>> Can you explain the behavior above? >>> >> >> #node1 could not connect to ESX because links were disconnected. As >> the >> #most obvious explanation. >> >> #You have logs, you are the only one who can answer this question >> with >> #some certainty. Others can only guess. >> >> >> Oops, my bad. I forgot to tell. I have two interfaces on each virtual >> machine (nodes). second interface was used for ESX links, so fence >> can be executed even though corosync links were disconnected. Looking >> forward to your response. Thanks > > #Having no fence delay means a death match (each node killing the other) > #is possible, but it doesn't guarantee that it will happen. Some of the > #time, one node will detect the outage and fence the other one before > #the other one can react. > > #It's basically an Old West shoot-out -- they may reach for their guns > #at the same time, but one may be quicker. > > #As Andrei suggested, the logs from both nodes could give you a timeline > #of what happened when. > > > Hi andrei, kindly see below logs. Based on time of logs, Node1 should have > fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown > by Node2. > Node1 tried to fence but failed. It could be connectivity, it could be credentials. > Is it possible to have a 2-Node active/passive setup in pacemaker/corosync > that the node that gets disconnected/interface down is the only one that > gets fenced? > If you could determine which node was disconnected you would not need any fencing at all. > Thanks guys > > *LOGS from Node2:* > > Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed, > forming new configuration. ... > Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be > fenced because the node is no longer part of the cluster ... > Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation 'reboot' > [2323] (call 2 from crmd.1084) for host 'ArcosRhel1' with device 'Fence1' > returned: 0 (OK) > Jul 17 13:33:50 ArcosRhel2 stonith-ng[1080]: notice: Operation reboot of > ArcosRhel1 by ArcosRhel2 for crmd.1084@ArcosRhel2.0426e6e1: OK > Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Stonith operation > 2/12:0:0:f9418e1f-1f13-4033-9eaa-aec705f807ef: OK (0) > Jul 17 13:33:50 ArcosRhel2 crmd[1084]: notice: Peer ArcosRhel1 was > terminated (reboot) by ArcosRhel2 for ArcosRhel2: OK ... > > > > *LOGS from NODE1* > Jul 17 13:33:26 ArcoSRhel1 corosync[1464]: [TOTEM ] A processor failed, > forming new configuration > Jul 17 13:33:28 ArcoSRhel1 pengine[1476]: warning: Node ArcosRhel2 will be > fenced because the node is no longer part of the cluster ... > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: warning: Mapping action='off' > to pcmk_reboot_action='off' > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: Fence1 can not fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:28 ArcoSRhel1 stonith-ng[1473]: notice: fence2 can fence > (reboot) ArcosRhel2: static-list > Jul 17 13:33:46 ArcoSRhel1 fence_vmware_soap: Unable to connect/login to > fencing device > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ Unable to connect/login to fencing device > ] > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ ] > Jul 17 13:33:46 ArcoSRhel1 stonith-ng[1473]: warning: > fence_vmware_soap[7157] stderr: [ ] > > > > > > >>> See my config below: >>> >>> [root@ArcosRhel2 cluster]# pcs config >>> Cluster Name: ARCOSCLUSTER >>> Corosync Nodes: >>> ? ArcosRhel1 ArcosRhel2 >>> Pacemaker Nodes: >>> ? ArcosRhel1 ArcosRhel2 >>> >>> Resources: >>> ? Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) >>> ? ?Attributes: cidr_netmask=32 ip=172.16.10.243 >>> ? ?Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) >>> ? ? ? ? ? ? ? ?start interval=0s timeout=20s (ClusterIP-start- >> interval-0s) >>> ? ? ? ? ? ? ? ?stop interval=0s timeout=20s (ClusterIP-stop- >> interval-0s) >>> >>> Stonith Devices: >>> ? Resource: Fence1 (class=stonith type=fence_vmware_soap) >>> ? ?Attributes: action=off ipaddr=172.16.10.151 login=admin >> passwd=123pass >>> pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s >> port=ArcosRhel1(Joniel) >>> ssl_insecure=1 pcmk_delay_max=0s >>> ? ?Operations: monitor
Re: [ClusterLabs] Weird Fencing Behavior
> > Hi, > > > > On my two-node active/passive setup, I configured fencing via > > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I > expected > > that both nodes will be stonithed simultaenously. > > > > On my test scenario, Node1 has ClusterIP resource. When I > disconnect > > service/corosync link physically, Node1 was fenced and Node2 keeps > alive > > given pcmk_delay=0 on both nodes. > > > > Can you explain the behavior above? > > > > #node1 could not connect to ESX because links were disconnected. As > the > #most obvious explanation. > > #You have logs, you are the only one who can answer this question > with > #some certainty. Others can only guess. > > > Oops, my bad. I forgot to tell. I have two interfaces on each virtual > machine (nodes). second interface was used for ESX links, so fence > can be executed even though corosync links were disconnected. Looking > forward to your response. Thanks #Having no fence delay means a death match (each node killing the other) #is possible, but it doesn't guarantee that it will happen. Some of the #time, one node will detect the outage and fence the other one before #the other one can react. #It's basically an Old West shoot-out -- they may reach for their guns #at the same time, but one may be quicker. #As Andrei suggested, the logs from both nodes could give you a timeline #of what happened when. Hi andrei, kindly see below logs. Based on time of logs, Node1 should have fenced first Node2, but in actual test/scenario, Node1 was fenced/shutdown by Node2. Is it possible to have a 2-Node active/passive setup in pacemaker/corosync that the node that gets disconnected/interface down is the only one that gets fenced? Thanks guys *LOGS from Node2:* Jul 17 13:33:27 ArcosRhel2 corosync[1048]: [TOTEM ] A processor failed, forming new configuration. Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] A new membership ( 172.16.10.242:220) was formed. Members left: 1 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [TOTEM ] Failed to receive the leave message. failed: 1 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [QUORUM] Members[1]: 2 Jul 17 13:33:28 ArcosRhel2 corosync[1048]: [MAIN ] Completed service synchronization, ready to provide service. Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Removing all ArcosRhel1 attributes for peer loss Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Lost attribute writer ArcosRhel1 Jul 17 13:33:28 ArcosRhel2 attrd[1082]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 cib[1079]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 cib[1079]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Our DC node (ArcosRhel1) left the cluster Jul 17 13:33:28 ArcosRhel2 pacemakerd[1074]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]: notice: Node ArcosRhel1 state is now lost Jul 17 13:33:28 ArcosRhel2 stonith-ng[1080]: notice: Purged 1 peers with id=1 and/or uname=ArcosRhel1 from the membership cache Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: State transition S_NOT_DC -> S_ELECTION Jul 17 13:33:28 ArcosRhel2 crmd[1084]: notice: State transition S_ELECTION -> S_INTEGRATION Jul 17 13:33:28 ArcosRhel2 crmd[1084]: warning: Input I_ELECTION_DC received in state S_INTEGRATION from do_election_check Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 will be fenced because the node is no longer part of the cluster Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Node ArcosRhel1 is unclean Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action fence2_stop_0 on ArcosRhel1 is unrunnable (offline) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Action ClusterIP_stop_0 on ArcosRhel1 is unrunnable (offline) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Scheduling Node ArcosRhel1 for STONITH Jul 17 13:33:30 ArcosRhel2 pengine[1083]: notice: Move fence2#011(Started ArcosRhel1 -> ArcosRhel2) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: notice: Move ClusterIP#011(Started ArcosRhel1 -> ArcosRhel2) Jul 17 13:33:30 ArcosRhel2 pengine[1083]: warning: Calculated transition 0 (with warnings), saving inputs in /var/lib/pacemaker/pengine/pe-warn-20.bz2 Jul 17 13:33:30 ArcosRhel2 crmd[1084]: notice: Requesting fencing (reboot) of node ArcosRhel1 Jul 17 13:33:30 ArcosRhel2 crmd[1084]: notice: Initiating start operation fence2_start_0 locally on ArcosRhel2 Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Client crmd.1084.cd70178e wants to fence (reboot) 'ArcosRhel1' with device '(any)' Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Requesting peer fencing (reboot) of ArcosRhel1 Jul 17 13:33:30 ArcosRhel2 stonith-ng[1080]: notice: Fence1
Re: [ClusterLabs] Weird Fencing Behavior
On Tue, 2018-07-17 at 21:29 +0800, Confidential Company wrote: > > > Hi, > > > > On my two-node active/passive setup, I configured fencing via > > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I > expected > > that both nodes will be stonithed simultaenously. > > > > On my test scenario, Node1 has ClusterIP resource. When I > disconnect > > service/corosync link physically, Node1 was fenced and Node2 keeps > alive > > given pcmk_delay=0 on both nodes. > > > > Can you explain the behavior above? > > > > #node1 could not connect to ESX because links were disconnected. As > the > #most obvious explanation. > > #You have logs, you are the only one who can answer this question > with > #some certainty. Others can only guess. > > > Oops, my bad. I forgot to tell. I have two interfaces on each virtual > machine (nodes). second interface was used for ESX links, so fence > can be executed even though corosync links were disconnected. Looking > forward to your response. Thanks Having no fence delay means a death match (each node killing the other) is possible, but it doesn't guarantee that it will happen. Some of the time, one node will detect the outage and fence the other one before the other one can react. It's basically an Old West shoot-out -- they may reach for their guns at the same time, but one may be quicker. As Andrei suggested, the logs from both nodes could give you a timeline of what happened when. > > See my config below: > > > > [root@ArcosRhel2 cluster]# pcs config > > Cluster Name: ARCOSCLUSTER > > Corosync Nodes: > > ArcosRhel1 ArcosRhel2 > > Pacemaker Nodes: > > ArcosRhel1 ArcosRhel2 > > > > Resources: > > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > > Attributes: cidr_netmask=32 ip=172.16.10.243 > > Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) > > start interval=0s timeout=20s (ClusterIP-start- > interval-0s) > > stop interval=0s timeout=20s (ClusterIP-stop- > interval-0s) > > > > Stonith Devices: > > Resource: Fence1 (class=stonith type=fence_vmware_soap) > > Attributes: action=off ipaddr=172.16.10.151 login=admin > passwd=123pass > > pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s > port=ArcosRhel1(Joniel) > > ssl_insecure=1 pcmk_delay_max=0s > > Operations: monitor interval=60s (Fence1-monitor-interval-60s) > > Resource: fence2 (class=stonith type=fence_vmware_soap) > > Attributes: action=off ipaddr=172.16.10.152 login=admin > passwd=123pass > > pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 > pcmk_monitor_timeout=60s > > port=ArcosRhel2(Ben) ssl_insecure=1 > > Operations: monitor interval=60s (fence2-monitor-interval-60s) > > Fencing Levels: > > > > Location Constraints: > > Resource: Fence1 > > Enabled on: ArcosRhel2 (score:INFINITY) > > (id:location-Fence1-ArcosRhel2-INFINITY) > > Resource: fence2 > > Enabled on: ArcosRhel1 (score:INFINITY) > > (id:location-fence2-ArcosRhel1-INFINITY) > > Ordering Constraints: > > Colocation Constraints: > > Ticket Constraints: > > > > Alerts: > > No alerts defined > > > > Resources Defaults: > > No defaults set > > Operations Defaults: > > No defaults set > > > > Cluster Properties: > > cluster-infrastructure: corosync > > cluster-name: ARCOSCLUSTER > > dc-version: 1.1.16-12.el7-94ff4df > > have-watchdog: false > > last-lrm-refresh: 1531810841 > > stonith-enabled: true > > > > Quorum: > > Options: > > > > > > > > ___ > > Users mailing list: Users@clusterlabs.org > > https://lists.clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratc > h.pdf > > Bugs: http://bugs.clusterlabs.org > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch. > pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Weird Fencing Behavior?
On Tue, Jul 17, 2018 at 10:58 AM, Confidential Company wrote: > Hi, > > On my two-node active/passive setup, I configured fencing via > fence_vmware_soap. I configured pcmk_delay=0 on both nodes so I expected > that both nodes will be stonithed simultaenously. > > On my test scenario, Node1 has ClusterIP resource. When I disconnect > service/corosync link physically, Node1 was fenced and Node2 keeps alive > given pcmk_delay=0 on both nodes. > > Can you explain the behavior above? > node1 could not connect to ESX because links were disconnected. As the most obvious explanation. You have logs, you are the only one who can answer this question with some certainty. Others can only guess. > > > See my config below: > > [root@ArcosRhel2 cluster]# pcs config > Cluster Name: ARCOSCLUSTER > Corosync Nodes: > ArcosRhel1 ArcosRhel2 > Pacemaker Nodes: > ArcosRhel1 ArcosRhel2 > > Resources: > Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2) > Attributes: cidr_netmask=32 ip=172.16.10.243 > Operations: monitor interval=30s (ClusterIP-monitor-interval-30s) > start interval=0s timeout=20s (ClusterIP-start-interval-0s) > stop interval=0s timeout=20s (ClusterIP-stop-interval-0s) > > Stonith Devices: > Resource: Fence1 (class=stonith type=fence_vmware_soap) > Attributes: action=off ipaddr=172.16.10.151 login=admin passwd=123pass > pcmk_host_list=ArcosRhel1 pcmk_monitor_timeout=60s port=ArcosRhel1(Joniel) > ssl_insecure=1 pcmk_delay_max=0s > Operations: monitor interval=60s (Fence1-monitor-interval-60s) > Resource: fence2 (class=stonith type=fence_vmware_soap) > Attributes: action=off ipaddr=172.16.10.152 login=admin passwd=123pass > pcmk_delay_max=0s pcmk_host_list=ArcosRhel2 pcmk_monitor_timeout=60s > port=ArcosRhel2(Ben) ssl_insecure=1 > Operations: monitor interval=60s (fence2-monitor-interval-60s) > Fencing Levels: > > Location Constraints: > Resource: Fence1 > Enabled on: ArcosRhel2 (score:INFINITY) > (id:location-Fence1-ArcosRhel2-INFINITY) > Resource: fence2 > Enabled on: ArcosRhel1 (score:INFINITY) > (id:location-fence2-ArcosRhel1-INFINITY) > Ordering Constraints: > Colocation Constraints: > Ticket Constraints: > > Alerts: > No alerts defined > > Resources Defaults: > No defaults set > Operations Defaults: > No defaults set > > Cluster Properties: > cluster-infrastructure: corosync > cluster-name: ARCOSCLUSTER > dc-version: 1.1.16-12.el7-94ff4df > have-watchdog: false > last-lrm-refresh: 1531810841 > stonith-enabled: true > > Quorum: > Options: > > > > ___ > Users mailing list: Users@clusterlabs.org > https://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org