[ClusterLabs] IPAddr2 RA and CLUSTERIP local_node
Hello, When using IPaddr2 RA in order to set a cloned IP address resource: pcs resource create vip1 ocf:heartbeat:IPaddr2 ip=10.0.0.100 iflabel=vip1 cidr_netmask=24 flush_routes=true op monitor interval=30s pcs resource clone vip1 clone-max=2 clone-node-max=2 globally-unique=true Then the cluster set the iptables CLUSTERIP module, and the result is something like that: # iptables -L -n . . . CLUSTERIP all -- 0.0.0.0/010.0.0.100 CLUSTERIP hashmode=sourceip-sourceport clustermac=A1:DE:DE:89:A6:FE total_nodes=2 local_node=2 hash_init=0 . . . The problem is that on both nodes, I can see that the local_node value on the CLUSTERIP is the same ("2") I looked on the RA source code, at https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/IPaddr2, and found that the local-node parameter is set to this value: IP_INC_NO=`expr ${OCF_RESKEY_CRM_meta_clone:-0} + 1` Can you think of a reason why my RA always set the local_node to "2"? Tomer Azran IDM & LINUX Professional Services tomer.az...@edp.co.il m: +972-52-6389961 t: +972-3-6438222 f: +972-3-6438004 www.edp.co.il ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] IPaddr2 RA and multicast mac
Hello, When using IPaddr2 RA in order to set a cloned IP address resource: pcs resource create vip1 ocf:heartbeat:IPaddr2 ip=10.0.0.100 iflabel=vip1 cidr_netmask=24 flush_routes=true op monitor interval=30s pcs resource clone vip1 clone-max=2 clone-node-max=2 globally-unique=true Then the cluster set the iptables CLUSTERIP module, and the result is something like that: # iptables -L -n . . . CLUSTERIP all -- 0.0.0.0/010.0.0.100 CLUSTERIP hashmode=sourceip-sourceport clustermac=A1:DE:DE:89:A6:FE total_nodes=2 local_node=1 hash_init=0 . . . The problem is that the RA picks a clustermac address which is not on the multicast range (must start with 01:00:5E) If not working with a multicast address, the traffic is being treated as broadcast which is bad. I found that you can set a multicast mac if you use the "mac" parameter, which solves the issue. Can the RA default be changed to use multicast range? In addition, I think that you might need to update the documentation (https://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/_clone_the_ip_address.html) and instruct users to use the mac parameter when creating the resource. In addition, I think that the documentation should instruct the user to enable multicast traffic on the network, which is not enabled by default. Tomer Azran IDM & LINUX Professional Services tomer.az...@edp.co.il<mailto:tomer.az...@edp.co.il> m: +972-52-6389961 t: +972-3-6438222 f: +972-3-6438004 [http://www.edp.co.il/logo1-small.png]<http://www.edp.co.il/> www.edp.co.il<http://www.edp.co.il/> ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] HAProxy resource agent
Hello, I'm planning to install an active\active HAProxy cluster on CentOS 7 I didn't found that there is any RA for HAproxy. I found some on the net but I'm not sure if I need it. For example: https://raw.githubusercontent.com/thisismitch/cluster-agents/master/haproxy I can always use the systemd service RA What is your recommendation? Thanks, Tomer. ___ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] IPaddr2 RA and bonding
I created a pull request with my code: https://github.com/ClusterLabs/pacemaker/pull/1319 -Original Message- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: Thursday, August 10, 2017 5:33 PM To: users@clusterlabs.org Subject: Re: [ClusterLabs] IPaddr2 RA and bonding On Thu, 2017-08-10 at 11:02 +, Tomer Azran wrote: > That looks exactly what I needed - it works. > I had to change the RA since I don't want to give an interface name as a > parameter (it might change from server to server and I want to create a > cloned resource). > I changed the RA a little bit to be able to guess the interface name based on > a IP address parameter. > The new RA is published on my github repo: > https://github.com/tomerazran/Pacemaker-Resource-Agents/blob/master/ip > speed Nice! Feel free to open a PR against the ClusterLabs/pacemaker repository with your changes. You could make it so the user has to specify one of iface or ip, or you could have another parameter iface_from_ip=true/false and put the IP in iface. > Just to document the solution in case anyone will need it also, I run the > following setup: > > # pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.1.3 op > monitor interval=30 # pcs resource create vip_speed > ocf:heartbeat:ipspeed ip=192.168.1.3 name=vip_speed op monitor > interval=5s --clone # pcs constraint location vip rule > score=-INFINITY vip_speed lt 1 or not_defined vip_speed > > Thank you for the support, > Tomer. > > > -Original Message- > From: Vladislav Bogdanov [mailto:bub...@hoster-ok.com] > Sent: Monday, August 7, 2017 9:22 PM > To: users@clusterlabs.org > Subject: Re: [ClusterLabs] IPaddr2 RA and bonding > > 07.08.2017 20:39, Tomer Azran wrote: > > I don't want to use this approach since I don't want to be depend on > > pinging to other host or couple of hosts. > > Is there any other solution? > > I'm thinking of writing a simple script that will take a bond down > > using ifdown command when there are no slaves available and put it > > on /sbin/ifdown-local > > For the similar purpose I wrote and use this one - > https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/i > fspeed > > It sets a node attribute on which other resources may depend via > location constraint - > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explaine > d/ch08.html#ch-rules > > It is not installed by default, and that should probably be fixed. > > That RA supports bonds (and bridges), and even tries to guess actual > resulting bond speed based on a bond type. For load-balancing bonds like LACP > (mode 4) one it uses coefficient of 0.8 (iirc) to reflect actual possible > load via multiple links. > > > > > > > -Original Message- > > From: Ken Gaillot [mailto:kgail...@redhat.com] > > Sent: Monday, August 7, 2017 7:14 PM > > To: Cluster Labs - All topics related to open-source clustering > > welcomed > > Subject: Re: [ClusterLabs] IPaddr2 RA and bonding > > > > On Mon, 2017-08-07 at 10:02 +, Tomer Azran wrote: > >> Hello All, > >> > >> > >> > >> We are using CentOS 7.3 with pacemaker in order to create a cluster. > >> > >> Each cluster node ha a bonding interface consists of two nics. > >> > >> The cluster has an IPAddr2 resource configured like that: > >> > >> > >> > >> # pcs resource show cluster_vip > >> > >> Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2) > >> > >> Attributes: ip=192.168.1.3 > >> > >> Operations: start interval=0s timeout=20s (cluster_vip > >> -start-interval-0s) > >> > >> stop interval=0s timeout=20s (cluster_vip > >> -stop-interval-0s) > >> > >> monitor interval=30s (cluster_vip > >> -monitor-interval-30s) > >> > >> > >> > >> > >> > >> We are running tests and want to simulate a state when the network > >> links are down. > >> > >> We are pulling both network cables from the server. > >> > >> > >> > >> The problem is that the resource is not marked as failed, and the > >> faulted node keep holding it and does not fail it over to the other > >> node. > >> > >> I think that the problem is within the bond interface. The bond > >> interface is marked as UP on the OS. It even can ping itself: > >> > >> > >> > >> # ip link show > >> > >> 2: e
Re: [ClusterLabs] IPaddr2 RA and bonding
That looks exactly what I needed - it works. I had to change the RA since I don't want to give an interface name as a parameter (it might change from server to server and I want to create a cloned resource). I changed the RA a little bit to be able to guess the interface name based on a IP address parameter. The new RA is published on my github repo: https://github.com/tomerazran/Pacemaker-Resource-Agents/blob/master/ipspeed Just to document the solution in case anyone will need it also, I run the following setup: # pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.1.3 op monitor interval=30 # pcs resource create vip_speed ocf:heartbeat:ipspeed ip=192.168.1.3 name=vip_speed op monitor interval=5s --clone # pcs constraint location vip rule score=-INFINITY vip_speed lt 1 or not_defined vip_speed Thank you for the support, Tomer. -Original Message- From: Vladislav Bogdanov [mailto:bub...@hoster-ok.com] Sent: Monday, August 7, 2017 9:22 PM To: users@clusterlabs.org Subject: Re: [ClusterLabs] IPaddr2 RA and bonding 07.08.2017 20:39, Tomer Azran wrote: > I don't want to use this approach since I don't want to be depend on pinging > to other host or couple of hosts. > Is there any other solution? > I'm thinking of writing a simple script that will take a bond down > using ifdown command when there are no slaves available and put it on > /sbin/ifdown-local For the similar purpose I wrote and use this one - https://github.com/ClusterLabs/pacemaker/blob/master/extra/resources/ifspeed It sets a node attribute on which other resources may depend via location constraint - http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch08.html#ch-rules It is not installed by default, and that should probably be fixed. That RA supports bonds (and bridges), and even tries to guess actual resulting bond speed based on a bond type. For load-balancing bonds like LACP (mode 4) one it uses coefficient of 0.8 (iirc) to reflect actual possible load via multiple links. > > > -Original Message- > From: Ken Gaillot [mailto:kgail...@redhat.com] > Sent: Monday, August 7, 2017 7:14 PM > To: Cluster Labs - All topics related to open-source clustering > welcomed > Subject: Re: [ClusterLabs] IPaddr2 RA and bonding > > On Mon, 2017-08-07 at 10:02 +, Tomer Azran wrote: >> Hello All, >> >> >> >> We are using CentOS 7.3 with pacemaker in order to create a cluster. >> >> Each cluster node ha a bonding interface consists of two nics. >> >> The cluster has an IPAddr2 resource configured like that: >> >> >> >> # pcs resource show cluster_vip >> >> Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2) >> >> Attributes: ip=192.168.1.3 >> >> Operations: start interval=0s timeout=20s (cluster_vip >> -start-interval-0s) >> >> stop interval=0s timeout=20s (cluster_vip >> -stop-interval-0s) >> >> monitor interval=30s (cluster_vip >> -monitor-interval-30s) >> >> >> >> >> >> We are running tests and want to simulate a state when the network >> links are down. >> >> We are pulling both network cables from the server. >> >> >> >> The problem is that the resource is not marked as failed, and the >> faulted node keep holding it and does not fail it over to the other >> node. >> >> I think that the problem is within the bond interface. The bond >> interface is marked as UP on the OS. It even can ping itself: >> >> >> >> # ip link show >> >> 2: eno3: mtu 1500 qdisc mq >> master bond1 state DOWN mode DEFAULT qlen 1000 >> >> link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff >> >> 3: eno4: mtu 1500 qdisc mq >> master bond1 state DOWN mode DEFAULT qlen 1000 >> >> link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff >> >> 9: bond1: mtu 1500 qdisc >> noqueue state DOWN mode DEFAULT qlen 1000 >> >> link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff >> >> >> >> As far as I understand the IPaddr2 RA does not check the link state >> of the interface – What can be done? > > You are correct. The IP address itself *is* up, even if the link is down, and > it can be used locally on that host. > > If you want to monitor connectivity to other hosts, you have to do that > separately. The most common approach is to use the ocf:pacemaker:ping > resource. See: > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemak > er_Explained/index.html#_moving_resources_due_to_connectiv
Re: [ClusterLabs] IPaddr2 RA and bonding
I don't want to use this approach since I don't want to be depend on pinging to other host or couple of hosts. Is there any other solution? I'm thinking of writing a simple script that will take a bond down using ifdown command when there are no slaves available and put it on /sbin/ifdown-local -Original Message- From: Ken Gaillot [mailto:kgail...@redhat.com] Sent: Monday, August 7, 2017 7:14 PM To: Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] IPaddr2 RA and bonding On Mon, 2017-08-07 at 10:02 +0000, Tomer Azran wrote: > Hello All, > > > > We are using CentOS 7.3 with pacemaker in order to create a cluster. > > Each cluster node ha a bonding interface consists of two nics. > > The cluster has an IPAddr2 resource configured like that: > > > > # pcs resource show cluster_vip > > Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2) > > Attributes: ip=192.168.1.3 > > Operations: start interval=0s timeout=20s (cluster_vip > -start-interval-0s) > > stop interval=0s timeout=20s (cluster_vip > -stop-interval-0s) > > monitor interval=30s (cluster_vip -monitor-interval-30s) > > > > > > We are running tests and want to simulate a state when the network > links are down. > > We are pulling both network cables from the server. > > > > The problem is that the resource is not marked as failed, and the > faulted node keep holding it and does not fail it over to the other > node. > > I think that the problem is within the bond interface. The bond > interface is marked as UP on the OS. It even can ping itself: > > > > # ip link show > > 2: eno3: mtu 1500 qdisc mq > master bond1 state DOWN mode DEFAULT qlen 1000 > > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > > 3: eno4: mtu 1500 qdisc mq > master bond1 state DOWN mode DEFAULT qlen 1000 > > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > > 9: bond1: mtu 1500 qdisc > noqueue state DOWN mode DEFAULT qlen 1000 > > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > > > > As far as I understand the IPaddr2 RA does not check the link state of > the interface – What can be done? You are correct. The IP address itself *is* up, even if the link is down, and it can be used locally on that host. If you want to monitor connectivity to other hosts, you have to do that separately. The most common approach is to use the ocf:pacemaker:ping resource. See: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html-single/Pacemaker_Explained/index.html#_moving_resources_due_to_connectivity_changes > BTW, I tried to find a solution on the bonding configuration which > disables the bond when no link is up, but I didn't find any. > > > > Tomer. > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Ken Gaillot ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
I read the corosync-qdevice (8) man page couple of times, and also the RH documentation at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html I think it will be great if you will be able to add some examples that demonstrate the difference between the two, and give some use cases that explain what is the preferred algorithm to use in each case. -Original Message- From: Jan Friesse [mailto:jfrie...@redhat.com] Sent: Monday, August 7, 2017 2:38 PM To: Cluster Labs - All topics related to open-source clustering welcomed ; kwenn...@redhat.com; Prasad, Shashank Subject: Re: [ClusterLabs] Two nodes cluster issue Tomer Azran napsal(a): > Just updating that I added another level of fencing using watchdog-fencing. > With the quorum device and this combination works in case of power failure of > both server and ipmi interface. > An important note is that the stonith-watchdog-timeout must be configured in > order to work. > After reading the following great post: > http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the > softdog watchdog since the I don't think ipmi watchdog will do no good in > case the ipmi interface is down (If it is OK it will be used as a fencing > method). > > Just for documenting the solution (in case someone else needed that), the > configuration I added is: > systemctl enable sbd > pcs property set no-quorum-policy=suicide pcs property set > stonith-watchdog-timeout=15 pcs quorum device add model net > host=qdevice algorithm=lms > > I just can't decide if the qdevice algorithm should be lms or ffsplit. I > couldn't determine the difference between them and I'm not sure which one is > the best when using two node cluster with qdevice and watchdog fencing. > > Can anyone advise on that? I'm pretty sure you've read corosync-qdevice (8) man page where is quite detailed description of algorithms so if you were not able to determine the difference them there is something wrong and man page needs improvement. What exactly you were unable to understand? Also for your use case with 2 nodes both algorithms behaves same way. Honza > > -Original Message- > From: Jan Friesse [mailto:jfrie...@redhat.com] > Sent: Tuesday, July 25, 2017 11:59 AM > To: Cluster Labs - All topics related to open-source clustering > welcomed <mailto:users@clusterlabs.org>; mailto:kwenn...@redhat.com; Prasad, > Shashank <mailto:sspra...@vanu.com> > Subject: Re: [ClusterLabs] Two nodes cluster issue > >> Tomer Azran napsal(a): >>> I tend to agree with Klaus – I don't think that having a hook that >>> bypass stonith is the right way. It is better to not use stonith at all. >>> I think I will try to use an iScsi target on my qdevice and set SBD >>> to use it. >>> I still don't understand why qdevice can't take the place SBD with >>> shared storage; correct me if I'm wrong, but it looks like both of >>> them are there for the same reason. >> >> Qdevice is there to be third side arbiter who decides which partition >> is quorate. It can also be seen as a quorum only node. So for two >> node cluster it can be viewed as a third node (eventho it is quite >> special because it cannot run resources). It is not doing fencing. >> >> SBD is fencing device. It is using disk as a third side arbiter. > > I've talked with Klaus and he told me that 7.3 is not using disk as a third > side arbiter so sorry for confusion. > > You should however still be able to use sbd for checking if pacemaker is > alive and if the partition has quorum - otherwise the watchdog kills the > node. So qdevice will give you "3rd" node and sbd fences unquorate partition. > > Or (as mentioned previously) you can use fabric fencing. > > Regards, > Honza > >> >> >>> >>> From: Klaus Wenninger [mailto:kwenn...@redhat.com] >>> Sent: Monday, July 24, 2017 9:01 PM >>> To: Cluster Labs - All topics related to open-source clustering >>> welcomed <mailto:users@clusterlabs.org>; Prasad, Shashank >>> <mailto:sspra...@vanu.com> >>> Subject: Re: [ClusterLabs] Two nodes cluster issue >>> >>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote: >>> Sometimes IPMI fence devices use shared power of the node, and it >>> cannot be avoided. >>> In such scenarios the HA cluster is NOT able to handle the power >>> failure of a node, since the power is shared with its own fence device. >>> The failure of IPMI based fencing can also exist due to other >>>
Re: [ClusterLabs] Antw: IPaddr2 RA and bonding
STONITH is enabled and working. -Original Message- From: Ulrich Windl [mailto:ulrich.wi...@rz.uni-regensburg.de] Sent: Monday, August 7, 2017 2:52 PM To: users@clusterlabs.org Subject: [ClusterLabs] Antw: IPaddr2 RA and bonding >>> Tomer Azran schrieb am 07.08.2017 um 12:02 >>> in Nachricht : > Hello All, > > We are using CentOS 7.3 with pacemaker in order to create a cluster. > Each cluster node ha a bonding interface consists of two nics. > The cluster has an IPAddr2 resource configured like that: > > # pcs resource show cluster_vip > Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2) > Attributes: ip=192.168.1.3 > Operations: start interval=0s timeout=20s (cluster_vip -start-interval-0s) > stop interval=0s timeout=20s (cluster_vip -stop-interval-0s) > monitor interval=30s (cluster_vip -monitor-interval-30s) > > > We are running tests and want to simulate a state when the network > links are down. > We are pulling both network cables from the server. > > The problem is that the resource is not marked as failed, and the > faulted node keep holding it and does not fail it over to the other node. > I think that the problem is within the bond interface. The bond > interface is marked as UP on the OS. It even can ping itself: > > # ip link show > 2: eno3: mtu 1500 qdisc mq > master > bond1 state DOWN mode DEFAULT qlen 1000 > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > 3: eno4: mtu 1500 qdisc mq > master > bond1 state DOWN mode DEFAULT qlen 1000 > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > 9: bond1: mtu 1500 qdisc > noqueue state DOWN mode DEFAULT qlen 1000 > link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff > > As far as I understand the IPaddr2 RA does not check the link state of > the interface - What can be done? > > BTW, I tried to find a solution on the bonding configuration which > disables the bond when no link is up, but I didn't find any. Show the cliuster status, not the network status. My guess is that you haven't activated stonith. Regards, Ulrich > > Tomer. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] IPaddr2 RA and bonding
Hello All, We are using CentOS 7.3 with pacemaker in order to create a cluster. Each cluster node ha a bonding interface consists of two nics. The cluster has an IPAddr2 resource configured like that: # pcs resource show cluster_vip Resource: cluster_vip (class=ocf provider=heartbeat type=IPaddr2) Attributes: ip=192.168.1.3 Operations: start interval=0s timeout=20s (cluster_vip -start-interval-0s) stop interval=0s timeout=20s (cluster_vip -stop-interval-0s) monitor interval=30s (cluster_vip -monitor-interval-30s) We are running tests and want to simulate a state when the network links are down. We are pulling both network cables from the server. The problem is that the resource is not marked as failed, and the faulted node keep holding it and does not fail it over to the other node. I think that the problem is within the bond interface. The bond interface is marked as UP on the OS. It even can ping itself: # ip link show 2: eno3: mtu 1500 qdisc mq master bond1 state DOWN mode DEFAULT qlen 1000 link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff 3: eno4: mtu 1500 qdisc mq master bond1 state DOWN mode DEFAULT qlen 1000 link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff 9: bond1: mtu 1500 qdisc noqueue state DOWN mode DEFAULT qlen 1000 link/ether 00:1e:67:f6:5a:8a brd ff:ff:ff:ff:ff:ff As far as I understand the IPaddr2 RA does not check the link state of the interface - What can be done? BTW, I tried to find a solution on the bonding configuration which disables the bond when no link is up, but I didn't find any. Tomer. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
Just updating that I added another level of fencing using watchdog-fencing. With the quorum device and this combination works in case of power failure of both server and ipmi interface. An important note is that the stonith-watchdog-timeout must be configured in order to work. After reading the following great post: http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog watchdog since the I don't think ipmi watchdog will do no good in case the ipmi interface is down (If it is OK it will be used as a fencing method). Just for documenting the solution (in case someone else needed that), the configuration I added is: systemctl enable sbd pcs property set no-quorum-policy=suicide pcs property set stonith-watchdog-timeout=15 pcs quorum device add model net host=qdevice algorithm=lms I just can't decide if the qdevice algorithm should be lms or ffsplit. I couldn't determine the difference between them and I'm not sure which one is the best when using two node cluster with qdevice and watchdog fencing. Can anyone advise on that? -Original Message- From: Jan Friesse [mailto:jfrie...@redhat.com] Sent: Tuesday, July 25, 2017 11:59 AM To: Cluster Labs - All topics related to open-source clustering welcomed ; kwenn...@redhat.com; Prasad, Shashank Subject: Re: [ClusterLabs] Two nodes cluster issue > Tomer Azran napsal(a): >> I tend to agree with Klaus – I don't think that having a hook that >> bypass stonith is the right way. It is better to not use stonith at all. >> I think I will try to use an iScsi target on my qdevice and set SBD >> to use it. >> I still don't understand why qdevice can't take the place SBD with >> shared storage; correct me if I'm wrong, but it looks like both of >> them are there for the same reason. > > Qdevice is there to be third side arbiter who decides which partition > is quorate. It can also be seen as a quorum only node. So for two node > cluster it can be viewed as a third node (eventho it is quite special > because it cannot run resources). It is not doing fencing. > > SBD is fencing device. It is using disk as a third side arbiter. I've talked with Klaus and he told me that 7.3 is not using disk as a third side arbiter so sorry for confusion. You should however still be able to use sbd for checking if pacemaker is alive and if the partition has quorum - otherwise the watchdog kills the node. So qdevice will give you "3rd" node and sbd fences unquorate partition. Or (as mentioned previously) you can use fabric fencing. Regards, Honza > > >> >> From: Klaus Wenninger [mailto:kwenn...@redhat.com] >> Sent: Monday, July 24, 2017 9:01 PM >> To: Cluster Labs - All topics related to open-source clustering >> welcomed ; Prasad, Shashank >> >> Subject: Re: [ClusterLabs] Two nodes cluster issue >> >> On 07/24/2017 07:32 PM, Prasad, Shashank wrote: >> Sometimes IPMI fence devices use shared power of the node, and it >> cannot be avoided. >> In such scenarios the HA cluster is NOT able to handle the power >> failure of a node, since the power is shared with its own fence device. >> The failure of IPMI based fencing can also exist due to other reasons >> also. >> >> A failure to fence the failed node will cause cluster to be marked >> UNCLEAN. >> To get over it, the following command needs to be invoked on the >> surviving node. >> >> pcs stonith confirm --force >> >> This can be automated by hooking a recovery script, when the the >> Stonith resource ‘Timed Out’ event. >> To be more specific, the Pacemaker Alerts can be used for watch for >> Stonith timeouts and failures. >> In that script, all that’s essentially to be executed is the >> aforementioned command. >> >> If I get you right here you can disable fencing then in the first place. >> Actually quorum-based-watchdog-fencing is the way to do this in a >> safe manner. This of course assumes you have a proper source for >> quorum in your 2-node-setup with e.g. qdevice or using a shared disk >> with sbd (not directly pacemaker quorum here but similar thing >> handled inside sbd). >> >> >> Since the alerts are issued from ‘hacluster’ login, sudo permissions >> for ‘hacluster’ needs to be configured. >> >> Thanx. >> >> >> From: Klaus Wenninger [mailto:kwenn...@redhat.com] >> Sent: Monday, July 24, 2017 9:24 PM >> To: Kristián Feldsam; Cluster Labs - All topics related to >> open-source clustering welcomed >> Subject: Re: [ClusterLabs] Two nodes cluster issue >> >> On 07/24/2017 05:37 PM, Kristián Feldsam wrote: >> I personally t
Re: [ClusterLabs] Two nodes cluster issue
Just updating that I added another level of fencing using watchdog-fencing. With the quorum device and this combination works in case of power failure of both server and ipmi interface. An important note is that the stonith-watchdog-timeout must be configured in order to work. After reading the following great post: http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog watchdog since the I don't think ipmi watchdog will do no good in case the ipmi interface is down (If it is OK it will be used as a fencing method). Just for documenting the solution (in case someone else needed that), the configuration I added is: systemctl enable sbd pcs property set no-quorum-policy=suicide pcs property set stonith-watchdog-timeout=15 pcs quorum device add model net host=qdevice algorithm=lms I just can't decide if the qdevice algorithm should be lms or ffsplit. I couldn't determine the difference between them and I'm not sure which one is the best when using two node cluster with qdevice and watchdog fencing. Can anyone advise on that? From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Tuesday, July 25, 2017 2:19 AM To: Tomer Azran ; Cluster Labs - All topics related to open-source clustering welcomed ; Prasad, Shashank Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 11:59 PM, Tomer Azran wrote: There is a problem with that – it seems like SBD with shared disk is disabled on CentOS 7.3: When I run: # sbd -d /dev/sbd create I get: Shared disk functionality not supported Which is why I suggested to go for watchdog-fencing using your qdevice setup. As said I haven't tried with qdevice-quorum - but I don't see a reason why that shouldn't work. no-quorum-policy has to be suicide of course. So I might try the software watchdog (softgod or ipmi_watchdog) A reliable watchdog is really crucial for sbd so I would recommend going for ipmi or anything else that has hardware behind. Klaus Tomer. From: Tomer Azran [mailto:tomer.az...@edp.co.il] Sent: Tuesday, July 25, 2017 12:30 AM To: kwenn...@redhat.com<mailto:kwenn...@redhat.com>; Cluster Labs - All topics related to open-source clustering welcomed <mailto:users@clusterlabs.org>; Prasad, Shashank <mailto:sspra...@vanu.com> Subject: Re: [ClusterLabs] Two nodes cluster issue I tend to agree with Klaus – I don't think that having a hook that bypass stonith is the right way. It is better to not use stonith at all. That was of course with a certain degree of hyperbolism. Anything is of course better than not having fencing at all. I might be wrong but what you were saying somehow was drawing a picture in my mind that you have your 2 nodes at 2 sites/rooms quite separated and in that case ... I think I will try to use an iScsi target on my qdevice and set SBD to use it. I still don't understand why qdevice can't take the place SBD with shared storage; correct me if I'm wrong, but it looks like both of them are there for the same reason. sbd with watchdog + qdevice can take the place of sbd with shared storage. qdevice is there to decide which part of a cluster is quorate and which not - in cases where after a split this wouldn't be possible. sbd (with watchdog) is then there to reliably take down the non-quorate part within a well defined time. From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Monday, July 24, 2017 9:01 PM To: Cluster Labs - All topics related to open-source clustering welcomed mailto:users@clusterlabs.org>>; Prasad, Shashank mailto:sspra...@vanu.com>> Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 07:32 PM, Prasad, Shashank wrote: Sometimes IPMI fence devices use shared power of the node, and it cannot be avoided. In such scenarios the HA cluster is NOT able to handle the power failure of a node, since the power is shared with its own fence device. The failure of IPMI based fencing can also exist due to other reasons also. A failure to fence the failed node will cause cluster to be marked UNCLEAN. To get over it, the following command needs to be invoked on the surviving node. pcs stonith confirm --force This can be automated by hooking a recovery script, when the the Stonith resource ‘Timed Out’ event. To be more specific, the Pacemaker Alerts can be used for watch for Stonith timeouts and failures. In that script, all that’s essentially to be executed is the aforementioned command. If I get you right here you can disable fencing then in the first place. Actually quorum-based-watchdog-fencing is the way to do this in a safe manner. This of course assumes you have a proper source for quorum in your 2-node-setup with e.g. qdevice or using a shared disk with sbd (not directly pacemaker quorum here but similar thing handled inside sbd). Since the alerts are issued from ‘hacluster’ login, sudo permissions for ‘hacluster’ needs to be configured. Thanx
Re: [ClusterLabs] Two nodes cluster issue
There is a problem with that – it seems like SBD with shared disk is disabled on CentOS 7.3: When I run: # sbd -d /dev/sbd create I get: Shared disk functionality not supported So I might try the software watchdog (softgod or ipmi_watchdog) Tomer. From: Tomer Azran [mailto:tomer.az...@edp.co.il] Sent: Tuesday, July 25, 2017 12:30 AM To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source clustering welcomed ; Prasad, Shashank Subject: Re: [ClusterLabs] Two nodes cluster issue I tend to agree with Klaus – I don't think that having a hook that bypass stonith is the right way. It is better to not use stonith at all. I think I will try to use an iScsi target on my qdevice and set SBD to use it. I still don't understand why qdevice can't take the place SBD with shared storage; correct me if I'm wrong, but it looks like both of them are there for the same reason. From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Monday, July 24, 2017 9:01 PM To: Cluster Labs - All topics related to open-source clustering welcomed mailto:users@clusterlabs.org>>; Prasad, Shashank mailto:sspra...@vanu.com>> Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 07:32 PM, Prasad, Shashank wrote: Sometimes IPMI fence devices use shared power of the node, and it cannot be avoided. In such scenarios the HA cluster is NOT able to handle the power failure of a node, since the power is shared with its own fence device. The failure of IPMI based fencing can also exist due to other reasons also. A failure to fence the failed node will cause cluster to be marked UNCLEAN. To get over it, the following command needs to be invoked on the surviving node. pcs stonith confirm --force This can be automated by hooking a recovery script, when the the Stonith resource ‘Timed Out’ event. To be more specific, the Pacemaker Alerts can be used for watch for Stonith timeouts and failures. In that script, all that’s essentially to be executed is the aforementioned command. If I get you right here you can disable fencing then in the first place. Actually quorum-based-watchdog-fencing is the way to do this in a safe manner. This of course assumes you have a proper source for quorum in your 2-node-setup with e.g. qdevice or using a shared disk with sbd (not directly pacemaker quorum here but similar thing handled inside sbd). Since the alerts are issued from ‘hacluster’ login, sudo permissions for ‘hacluster’ needs to be configured. Thanx. From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Monday, July 24, 2017 9:24 PM To: Kristián Feldsam; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 05:37 PM, Kristián Feldsam wrote: I personally think that power off node by switched pdu is more safe, or not? True if that is working in you environment. If you can't do a physical setup where you aren't simultaneously loosing connection to both your node and the switch-device (or you just want to cover cases where that happens) you have to come up with something else. S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz> www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny. FELDSAM s.r.o. V rohu 434/3 Praha 4 – Libuš, PSČ 142 00 IČ: 290 60 958, DIČ: CZ290 60 958 C 200350 vedená u Městského soudu v Praze Banka: Fio banka a.s. Číslo účtu: 2400330446/2010 BIC: FIOBCZPPXX IBAN: CZ82 2010 0024 0033 0446 On 24 Jul 2017, at 17:27, Klaus Wenninger mailto:kwenn...@redhat.com>> wrote: On 07/24/2017 05:15 PM, Tomer Azran wrote: I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead. Why doesn't it happens? That is not how quorum works. It just limits the decision-making to the quorate subset of the cluster. Still the unknown nodes are not sure to be down. That is why I suggested to have quorum-based watchdog-fencing with sbd. That would assure that within a certain time all nodes of the non-quorate part of the cluster are down. On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" mailto:dmitri.maz...@gmail.com>> wrote: On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability to use it. > Is that the only solution? No, but I'd recommend thinking about it first. Are you sure you will care about your cluster working when your server room is on fire? 'Cause unless you have halon suppression, your server room is a complete write-off anyway. (Think water from sprinklers hitting rich chunky volts in the servers.) Dima ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clu
Re: [ClusterLabs] Two nodes cluster issue
I tend to agree with Klaus – I don't think that having a hook that bypass stonith is the right way. It is better to not use stonith at all. I think I will try to use an iScsi target on my qdevice and set SBD to use it. I still don't understand why qdevice can't take the place SBD with shared storage; correct me if I'm wrong, but it looks like both of them are there for the same reason. From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Monday, July 24, 2017 9:01 PM To: Cluster Labs - All topics related to open-source clustering welcomed ; Prasad, Shashank Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 07:32 PM, Prasad, Shashank wrote: Sometimes IPMI fence devices use shared power of the node, and it cannot be avoided. In such scenarios the HA cluster is NOT able to handle the power failure of a node, since the power is shared with its own fence device. The failure of IPMI based fencing can also exist due to other reasons also. A failure to fence the failed node will cause cluster to be marked UNCLEAN. To get over it, the following command needs to be invoked on the surviving node. pcs stonith confirm --force This can be automated by hooking a recovery script, when the the Stonith resource ‘Timed Out’ event. To be more specific, the Pacemaker Alerts can be used for watch for Stonith timeouts and failures. In that script, all that’s essentially to be executed is the aforementioned command. If I get you right here you can disable fencing then in the first place. Actually quorum-based-watchdog-fencing is the way to do this in a safe manner. This of course assumes you have a proper source for quorum in your 2-node-setup with e.g. qdevice or using a shared disk with sbd (not directly pacemaker quorum here but similar thing handled inside sbd). Since the alerts are issued from ‘hacluster’ login, sudo permissions for ‘hacluster’ needs to be configured. Thanx. From: Klaus Wenninger [mailto:kwenn...@redhat.com] Sent: Monday, July 24, 2017 9:24 PM To: Kristián Feldsam; Cluster Labs - All topics related to open-source clustering welcomed Subject: Re: [ClusterLabs] Two nodes cluster issue On 07/24/2017 05:37 PM, Kristián Feldsam wrote: I personally think that power off node by switched pdu is more safe, or not? True if that is working in you environment. If you can't do a physical setup where you aren't simultaneously loosing connection to both your node and the switch-device (or you just want to cover cases where that happens) you have to come up with something else. S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz> www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny. FELDSAM s.r.o. V rohu 434/3 Praha 4 – Libuš, PSČ 142 00 IČ: 290 60 958, DIČ: CZ290 60 958 C 200350 vedená u Městského soudu v Praze Banka: Fio banka a.s. Číslo účtu: 2400330446/2010 BIC: FIOBCZPPXX IBAN: CZ82 2010 0024 0033 0446 On 24 Jul 2017, at 17:27, Klaus Wenninger mailto:kwenn...@redhat.com>> wrote: On 07/24/2017 05:15 PM, Tomer Azran wrote: I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead. Why doesn't it happens? That is not how quorum works. It just limits the decision-making to the quorate subset of the cluster. Still the unknown nodes are not sure to be down. That is why I suggested to have quorum-based watchdog-fencing with sbd. That would assure that within a certain time all nodes of the non-quorate part of the cluster are down. On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" mailto:dmitri.maz...@gmail.com>> wrote: On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability to use it. > Is that the only solution? No, but I'd recommend thinking about it first. Are you sure you will care about your cluster working when your server room is on fire? 'Cause unless you have halon suppression, your server room is a complete write-off anyway. (Think water from sprinklers hitting rich chunky volts in the servers.) Dima ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/> ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/> Getting started: ht
Re: [ClusterLabs] Two nodes cluster issue
So your suggestion is to use sbd with or without qdevice? What is the point of having a qdevice in two node cluster if it doesn't help in this situation? From: Klaus Wenninger Sent: Monday, July 24, 18:28 Subject: Re: [ClusterLabs] Two nodes cluster issue To: Cluster Labs - All topics related to open-source clustering welcomed, Tomer Azran On 07/24/2017 05:15 PM, Tomer Azran wrote: I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead. Why doesn't it happens? That is not how quorum works. It just limits the decision-making to the quorate subset of the cluster. Still the unknown nodes are not sure to be down. That is why I suggested to have quorum-based watchdog-fencing with sbd. That would assure that within a certain time all nodes of the non-quorate part of the cluster are down. On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" mailto:dmitri.maz...@gmail.com>> wrote: On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability to use it. > Is that the only solution? No, but I'd recommend thinking about it first. Are you sure you will care about your cluster working when your server room is on fire? 'Cause unless you have halon suppression, your server room is a complete write-off anyway. (Think water from sprinklers hitting rich chunky volts in the servers.) Dima ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure Red Hat kwenning<mailto:kwenn...@redhat.com>@redhat.com<mailto:kwenn...@redhat.com> ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
I still don't understand why the qdevice concept doesn't help on this situation. Since the master node is down, I would expect the quorum to declare it as dead. Why doesn't it happens? On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" mailto:dmitri.maz...@gmail.com>> wrote: On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability to use it. > Is that the only solution? No, but I'd recommend thinking about it first. Are you sure you will care about your cluster working when your server room is on fire? 'Cause unless you have halon suppression, your server room is a complete write-off anyway. (Think water from sprinklers hitting rich chunky volts in the servers.) Dima ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Two nodes cluster issue
We don't have the ability to use it. Is that the only solution? In addition, it will not cover a scenario that the server room is down (for example - fire or earthquake), the switch will go down as well. From: Klaus Wenninger Sent: Monday, July 24, 15:31 Subject: Re: [ClusterLabs] Two nodes cluster issue To: Cluster Labs - All topics related to open-source clustering welcomed, Kristián Feldsam On 07/24/2017 02:05 PM, Kristián Feldsam wrote: Hello, you have to use second fencing device, for ex. APC Switched PDU. https://wiki.clusterlabs.org/wiki/Configure_Multiple_Fencing_Devices_Using_pcs Problem here seems to be that the fencing devices available are running from the same power-supply as the node itself. So they are kind of useless to determine weather the partner-node has no power or simply is no reachable via network. S pozdravem Kristián Feldsam Tel.: +420 773 303 353, +421 944 137 535 E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz> www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové a serverové služby za adekvátní ceny. FELDSAM s.r.o. V rohu 434/3 Praha 4 – Libuš, PSČ 142 00 IČ: 290 60 958, DIČ: CZ290 60 958 C 200350 vedená u Městského soudu v Praze Banka: Fio banka a.s. Číslo účtu: 2400330446/2010 BIC: FIOBCZPPXX IBAN: CZ82 2010 0024 0033 0446 On 24 Jul 2017, at 13:51, Tomer Azran <mailto:tomer.az...@edp.co.il> wrote: Hello, We built a pacemaker cluster with 2 physical servers. We configured DRBD in Master\Slave setup, a floating IP and file system mount in Active\Passive mode. We configured two STONITH devices (fence_ipmilan), one for each server. We are trying to simulate a situation when the Master server crushes with no power. We pulled both of the PSU cables and the server becomes offline (UNCLEAN). The resources that the Master use to hold are now in Started (UNCLEAN) state. The state is unclean since the STONITH failed (the STONITH device is located on the server (Intel RMM4 - IPMI) – which uses the same power supply). The problem is that now, the cluster does not releasing the resources that the Master holds, and the service goes down. Is there any way to overcome this situation? We tried to add a qdevice but got the same results. If you have already setup qdevice (using an additional node or so) you could use quorum-based watchdog-fencing via SBD. We are using pacemaker 1.1.15 on CentOS 7.3 Thanks, Tomer. ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users> Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/> ___ Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users> Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure Red Hat kwenning<mailto:kwenn...@redhat.com>@redhat.com<mailto:kwenn...@redhat.com> ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Two nodes cluster issue
Hello, We built a pacemaker cluster with 2 physical servers. We configured DRBD in Master\Slave setup, a floating IP and file system mount in Active\Passive mode. We configured two STONITH devices (fence_ipmilan), one for each server. We are trying to simulate a situation when the Master server crushes with no power. We pulled both of the PSU cables and the server becomes offline (UNCLEAN). The resources that the Master use to hold are now in Started (UNCLEAN) state. The state is unclean since the STONITH failed (the STONITH device is located on the server (Intel RMM4 - IPMI) - which uses the same power supply). The problem is that now, the cluster does not releasing the resources that the Master holds, and the service goes down. Is there any way to overcome this situation? We tried to add a qdevice but got the same results. We are using pacemaker 1.1.15 on CentOS 7.3 Thanks, Tomer. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org