Re: [ClusterLabs] Two nodes cluster issue

Jan Friesse Tue, 08 Aug 2017 02:09:09 -0700

I read the corosync-qdevice (8) man page couple of times, and also the RH 
documentation at 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html
I think it will be great if you will be able to add some examples that 
demonstrate the difference between the two, and give some use cases that 
explain what is the preferred algorithm to use in each case.

It's really hard to say which algorithm suits concrete situation"better" but yes, I will try to add some examples.


Regards,
  Honza


-----Original Message-----
From: Jan Friesse [mailto:jfrie...@redhat.com]
Sent: Monday, August 7, 2017 2:38 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
<users@clusterlabs.org>; kwenn...@redhat.com; Prasad, Shashank 
<sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue

Tomer Azran napsal(a):

Just updating that I added another level of fencing using watchdog-fencing.
With the quorum device and this combination works in case of power failure of 
both server and ipmi interface.
An important note is that the stonith-watchdog-timeout must be configured in 
order to work.
After reading the following great post: 
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog 
watchdog since the I don't think ipmi watchdog will do no good in case the ipmi 
interface is down (If it is OK it will be used as a fencing method).

Just for documenting the solution (in case someone else needed that), the 
configuration I added is:
systemctl enable sbd
pcs property set no-quorum-policy=suicide pcs property set
stonith-watchdog-timeout=15 pcs quorum device add model net
host=qdevice algorithm=lms

I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
couldn't determine the difference between them and I'm not sure which one is 
the best when using two node cluster with qdevice and watchdog fencing.

Can anyone advise on that?


I'm pretty sure you've read corosync-qdevice (8) man page where is quite 
detailed description of algorithms so if you were not able to determine the 
difference them there is something wrong and man page needs improvement. What 
exactly you were unable to understand?

Also for your use case with 2 nodes both algorithms behaves same way.

Honza


-----Original Message-----
From: Jan Friesse [mailto:jfrie...@redhat.com]
Sent: Tuesday, July 25, 2017 11:59 AM
To: Cluster Labs - All topics related to open-source clustering
welcomed <mailto:users@clusterlabs.org>; mailto:kwenn...@redhat.com; Prasad,
Shashank <mailto:sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue

Tomer Azran napsal(a):

I tend to agree with Klaus – I don't think that having a hook that
bypass stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD
to use it.
I still don't understand why qdevice can't take the place SBD with
shared storage; correct me if I'm wrong, but it looks like both of
them are there for the same reason.


Qdevice is there to be third side arbiter who decides which partition
is quorate. It can also be seen as a quorum only node. So for two
node cluster it can be viewed as a third node (eventho it is quite
special because it cannot run resources). It is not doing fencing.

SBD is fencing device. It is using disk as a third side arbiter.


I've talked with Klaus and he told me that 7.3 is not using disk as a third 
side arbiter so sorry for confusion.

You should however still be able to use sbd for checking if pacemaker is alive and if the 
partition has quorum - otherwise the watchdog kills the node. So qdevice will give you 
"3rd" node and sbd fences unquorate partition.

Or (as mentioned previously) you can use fabric fencing.

Regards,
     Honza

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed <mailto:users@clusterlabs.org>; Prasad, Shashank
<mailto:sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it
cannot be avoided.
In such scenarios the HA cluster is NOT able to handle the power
failure of a node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other
reasons also.

A failure to fence the failed node will cause cluster to be marked
UNCLEAN.
To get over it, the following command needs to be invoked on the
surviving node.

pcs stonith confirm <failed_node_name> --force

This can be automated by hooking a recovery script, when the the
Stonith resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for
Stonith timeouts and failures.
In that script, all that’s essentially to be executed is the
aforementioned command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared disk
with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

Since the alerts are issued from ‘hacluster’ login, sudo permissions
for ‘hacluster’ needs to be configured.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to
open-source clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe,
or not?

True if that is working in you environment. If you can't do a
physical setup where you aren't simultaneously loosing connection to
both your node and the switch-device (or you just want to cover
cases where that happens) you have to come up with something else.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: mailto:supp...@feldhost.cz<mailto:supp...@feldhost.cz>

http://www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální
hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010 0000 0024 0033 0446

On 24 Jul 2017, at 17:27, Klaus Wenninger
<mailto:kwenn...@redhat.com<mailto:kwenn...@redhat.com>> wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on
this situation. Since the master node is down, I would expect the
quorum to declare it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to
the quorate subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the
non-quorate part of the cluster are down.

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk"
<mailto:dmitri.maz...@gmail.com<mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:

We don't have the ability to use it.

Is that the only solution?




No, but I'd recommend thinking about it first. Are you sure you will

care about your cluster working when your server room is on fire?
'Cause

unless you have halon suppression, your server room is a complete

write-off anyway. (Think water from sprinklers hitting rich chunky
volts

in the servers.)



Dima



_______________________________________________

Users mailing list:
mailto:Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users



Project Home:
http://www.clusterlabs.org<http://www.clusterlabs.org/>

Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>





_______________________________________________

Users mailing list:
mailto:Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users



Project Home:
http://www.clusterlabs.org<http://www.clusterlabs.org/>

Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>


--

Klaus Wenninger



Senior Software Engineer, EMEA ENG Openstack Infrastructure



Red Hat



mailto:kwenn...@redhat.com<mailto:kwenn...@redhat.com>
_______________________________________________
Users mailing list:
mailto:Users@clusterlabs.org<mailto:Users@clusterlabs.org>
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home:
http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>





_______________________________________________

Users mailing list:
mailto:Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users



Project Home: http://www.clusterlabs.org

Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org




_______________________________________________
Users mailing list: mailto:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: mailto:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: mailto:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: mailto:Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started:
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: mailto:Users@clusterlabs.org 
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org
_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

Reply via email to