subject:"\[ClusterLabs\] Two nodes cluster issue"

Re: [ClusterLabs] Two nodes cluster issue

2017-08-08 Thread Jan Friesse




I read the corosync-qdevice (8) man page couple of times, and also the RH 
documentation at 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html
I think it will be great if you will be able to add some examples that 
demonstrate the difference between the two, and give some use cases that 
explain what is the preferred algorithm to use in each case.


It's really hard to say which algorithm suits concrete situation 
"better" but yes, I will try to add some examples.


Regards,
  Honza




-Original Message-
From: Jan Friesse [mailto:jfrie...@redhat.com]
Sent: Monday, August 7, 2017 2:38 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; kwenn...@redhat.com; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue

Tomer Azran napsal(a):

Just updating that I added another level of fencing using watchdog-fencing.
With the quorum device and this combination works in case of power failure of 
both server and ipmi interface.
An important note is that the stonith-watchdog-timeout must be configured in 
order to work.
After reading the following great post: 
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog 
watchdog since the I don't think ipmi watchdog will do no good in case the ipmi 
interface is down (If it is OK it will be used as a fencing method).

Just for documenting the solution (in case someone else needed that), the 
configuration I added is:
systemctl enable sbd
pcs property set no-quorum-policy=suicide pcs property set
stonith-watchdog-timeout=15 pcs quorum device add model net
host=qdevice algorithm=lms

I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
couldn't determine the difference between them and I'm not sure which one is 
the best when using two node cluster with qdevice and watchdog fencing.

Can anyone advise on that?


I'm pretty sure you've read corosync-qdevice (8) man page where is quite 
detailed description of algorithms so if you were not able to determine the 
difference them there is something wrong and man page needs improvement. What 
exactly you were unable to understand?

Also for your use case with 2 nodes both algorithms behaves same way.

Honza



-Original Message-
From: Jan Friesse [mailto:jfrie...@redhat.com]
Sent: Tuesday, July 25, 2017 11:59 AM
To: Cluster Labs - All topics related to open-source clustering
welcomed <mailto:users@clusterlabs.org>; mailto:kwenn...@redhat.com; Prasad,
Shashank <mailto:sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue


Tomer Azran napsal(a):

I tend to agree with Klaus – I don't think that having a hook that
bypass stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD
to use it.
I still don't understand why qdevice can't take the place SBD with
shared storage; correct me if I'm wrong, but it looks like both of
them are there for the same reason.


Qdevice is there to be third side arbiter who decides which partition
is quorate. It can also be seen as a quorum only node. So for two
node cluster it can be viewed as a third node (eventho it is quite
special because it cannot run resources). It is not doing fencing.

SBD is fencing device. It is using disk as a third side arbiter.


I've talked with Klaus and he told me that 7.3 is not using disk as a third 
side arbiter so sorry for confusion.

You should however still be able to use sbd for checking if pacemaker is alive and if the 
partition has quorum - otherwise the watchdog kills the node. So qdevice will give you 
"3rd" node and sbd fences unquorate partition.

Or (as mentioned previously) you can use fabric fencing.

Regards,
 Honza






From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed <mailto:users@clusterlabs.org>; Prasad, Shashank
<mailto:sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it
cannot be avoided.
In such scenarios the HA cluster is NOT able to handle the power
failure of a node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other
reasons also.

A failure to fence the failed node will cause cluster to be marked
UNCLEAN.
To get over it, the following command needs to be invoked on the
surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the
Stonith resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for
Stonith timeouts and failures.
In that script, all that’s essentiall

Re: [ClusterLabs] Two nodes cluster issue

2017-08-07 Thread Tomer Azran

I read the corosync-qdevice (8) man page couple of times, and also the RH 
documentation at 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-quorumdev-HAAR.html
 
I think it will be great if you will be able to add some examples that 
demonstrate the difference between the two, and give some use cases that 
explain what is the preferred algorithm to use in each case. 

-Original Message-
From: Jan Friesse [mailto:jfrie...@redhat.com] 
Sent: Monday, August 7, 2017 2:38 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; kwenn...@redhat.com; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue

Tomer Azran napsal(a):
> Just updating that I added another level of fencing using watchdog-fencing.
> With the quorum device and this combination works in case of power failure of 
> both server and ipmi interface.
> An important note is that the stonith-watchdog-timeout must be configured in 
> order to work.
> After reading the following great post: 
> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the 
> softdog watchdog since the I don't think ipmi watchdog will do no good in 
> case the ipmi interface is down (If it is OK it will be used as a fencing 
> method).
>
> Just for documenting the solution (in case someone else needed that), the 
> configuration I added is:
> systemctl enable sbd
> pcs property set no-quorum-policy=suicide pcs property set 
> stonith-watchdog-timeout=15 pcs quorum device add model net 
> host=qdevice algorithm=lms
>
> I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
> couldn't determine the difference between them and I'm not sure which one is 
> the best when using two node cluster with qdevice and watchdog fencing.
>
> Can anyone advise on that?

I'm pretty sure you've read corosync-qdevice (8) man page where is quite 
detailed description of algorithms so if you were not able to determine the 
difference them there is something wrong and man page needs improvement. What 
exactly you were unable to understand?

Also for your use case with 2 nodes both algorithms behaves same way.

Honza

>
> -Original Message-
> From: Jan Friesse [mailto:jfrie...@redhat.com]
> Sent: Tuesday, July 25, 2017 11:59 AM
> To: Cluster Labs - All topics related to open-source clustering 
> welcomed <mailto:users@clusterlabs.org>; mailto:kwenn...@redhat.com; Prasad, 
> Shashank <mailto:sspra...@vanu.com>
> Subject: Re: [ClusterLabs] Two nodes cluster issue
>
>> Tomer Azran napsal(a):
>>> I tend to agree with Klaus – I don't think that having a hook that 
>>> bypass stonith is the right way. It is better to not use stonith at all.
>>> I think I will try to use an iScsi target on my qdevice and set SBD 
>>> to use it.
>>> I still don't understand why qdevice can't take the place SBD with 
>>> shared storage; correct me if I'm wrong, but it looks like both of 
>>> them are there for the same reason.
>>
>> Qdevice is there to be third side arbiter who decides which partition 
>> is quorate. It can also be seen as a quorum only node. So for two 
>> node cluster it can be viewed as a third node (eventho it is quite 
>> special because it cannot run resources). It is not doing fencing.
>>
>> SBD is fencing device. It is using disk as a third side arbiter.
>
> I've talked with Klaus and he told me that 7.3 is not using disk as a third 
> side arbiter so sorry for confusion.
>
> You should however still be able to use sbd for checking if pacemaker is 
> alive and if the partition has quorum - otherwise the watchdog kills the 
> node. So qdevice will give you "3rd" node and sbd fences unquorate partition.
>
> Or (as mentioned previously) you can use fabric fencing.
>
> Regards,
>     Honza
>
>>
>>
>>>
>>> From: Klaus Wenninger [mailto:kwenn...@redhat.com]
>>> Sent: Monday, July 24, 2017 9:01 PM
>>> To: Cluster Labs - All topics related to open-source clustering 
>>> welcomed <mailto:users@clusterlabs.org>; Prasad, Shashank 
>>> <mailto:sspra...@vanu.com>
>>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>>
>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>> Sometimes IPMI fence devices use shared power of the node, and it 
>>> cannot be avoided.
>>> In such scenarios the HA cluster is NOT able to handle the power 
>>> failure of a node, since the power is shared with its own fence device.
>>> The failure of IPMI based fencing can also exist due to other 
>>>

Re: [ClusterLabs] Two nodes cluster issue

2017-08-07 Thread Jan Friesse


Tomer Azran napsal(a):

Just updating that I added another level of fencing using watchdog-fencing.
With the quorum device and this combination works in case of power failure of 
both server and ipmi interface.
An important note is that the stonith-watchdog-timeout must be configured in 
order to work.
After reading the following great post: 
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog 
watchdog since the I don't think ipmi watchdog will do no good in case the ipmi 
interface is down (If it is OK it will be used as a fencing method).

Just for documenting the solution (in case someone else needed that), the 
configuration I added is:
systemctl enable sbd
pcs property set no-quorum-policy=suicide
pcs property set stonith-watchdog-timeout=15
pcs quorum device add model net host=qdevice algorithm=lms

I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
couldn't determine the difference between them and I'm not sure which one is 
the best when using two node cluster with qdevice and watchdog fencing.

Can anyone advise on that?


I'm pretty sure you've read corosync-qdevice (8) man page where is quite 
detailed description of algorithms so if you were not able to determine 
the difference them there is something wrong and man page needs 
improvement. What exactly you were unable to understand?


Also for your use case with 2 nodes both algorithms behaves same way.

Honza



-Original Message-
From: Jan Friesse [mailto:jfrie...@redhat.com]
Sent: Tuesday, July 25, 2017 11:59 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; kwenn...@redhat.com; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue


Tomer Azran napsal(a):

I tend to agree with Klaus – I don't think that having a hook that
bypass stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD
to use it.
I still don't understand why qdevice can't take the place SBD with
shared storage; correct me if I'm wrong, but it looks like both of
them are there for the same reason.


Qdevice is there to be third side arbiter who decides which partition
is quorate. It can also be seen as a quorum only node. So for two node
cluster it can be viewed as a third node (eventho it is quite special
because it cannot run resources). It is not doing fencing.

SBD is fencing device. It is using disk as a third side arbiter.


I've talked with Klaus and he told me that 7.3 is not using disk as a third 
side arbiter so sorry for confusion.

You should however still be able to use sbd for checking if pacemaker is alive and if the 
partition has quorum - otherwise the watchdog kills the node. So qdevice will give you 
"3rd" node and sbd fences unquorate partition.

Or (as mentioned previously) you can use fabric fencing.

Regards,
Honza






From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed ; Prasad, Shashank

Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it
cannot be avoided.
In such scenarios the HA cluster is NOT able to handle the power
failure of a node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons
also.

A failure to fence the failed node will cause cluster to be marked
UNCLEAN.
To get over it, the following command needs to be invoked on the
surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the
Stonith resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for
Stonith timeouts and failures.
In that script, all that’s essentially to be executed is the
aforementioned command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared disk
with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).


Since the alerts are issued from ‘hacluster’ login, sudo permissions
for ‘hacluster’ needs to be configured.

Thanx.


From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to
open-source clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe,
or not?

True if that is working in you environment. If you can't do a
physical setup where you aren't simultaneously loosing connecti

Re: [ClusterLabs] Two nodes cluster issue

2017-08-07 Thread Tomer Azran

Just updating that I added another level of fencing using watchdog-fencing.
With the quorum device and this combination works in case of power failure of 
both server and ipmi interface.
An important note is that the stonith-watchdog-timeout must be configured in 
order to work.
After reading the following great post: 
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog 
watchdog since the I don't think ipmi watchdog will do no good in case the ipmi 
interface is down (If it is OK it will be used as a fencing method).

Just for documenting the solution (in case someone else needed that), the 
configuration I added is:
systemctl enable sbd 
pcs property set no-quorum-policy=suicide
pcs property set stonith-watchdog-timeout=15
pcs quorum device add model net host=qdevice algorithm=lms

I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
couldn't determine the difference between them and I'm not sure which one is 
the best when using two node cluster with qdevice and watchdog fencing.

Can anyone advise on that?

-Original Message-
From: Jan Friesse [mailto:jfrie...@redhat.com] 
Sent: Tuesday, July 25, 2017 11:59 AM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; kwenn...@redhat.com; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue

> Tomer Azran napsal(a):
>> I tend to agree with Klaus – I don't think that having a hook that 
>> bypass stonith is the right way. It is better to not use stonith at all.
>> I think I will try to use an iScsi target on my qdevice and set SBD 
>> to use it.
>> I still don't understand why qdevice can't take the place SBD with 
>> shared storage; correct me if I'm wrong, but it looks like both of 
>> them are there for the same reason.
>
> Qdevice is there to be third side arbiter who decides which partition 
> is quorate. It can also be seen as a quorum only node. So for two node 
> cluster it can be viewed as a third node (eventho it is quite special 
> because it cannot run resources). It is not doing fencing.
>
> SBD is fencing device. It is using disk as a third side arbiter.

I've talked with Klaus and he told me that 7.3 is not using disk as a third 
side arbiter so sorry for confusion.

You should however still be able to use sbd for checking if pacemaker is alive 
and if the partition has quorum - otherwise the watchdog kills the node. So 
qdevice will give you "3rd" node and sbd fences unquorate partition.

Or (as mentioned previously) you can use fabric fencing.

Regards,
   Honza

>
>
>>
>> From: Klaus Wenninger [mailto:kwenn...@redhat.com]
>> Sent: Monday, July 24, 2017 9:01 PM
>> To: Cluster Labs - All topics related to open-source clustering 
>> welcomed ; Prasad, Shashank 
>> 
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>
>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>> Sometimes IPMI fence devices use shared power of the node, and it 
>> cannot be avoided.
>> In such scenarios the HA cluster is NOT able to handle the power 
>> failure of a node, since the power is shared with its own fence device.
>> The failure of IPMI based fencing can also exist due to other reasons 
>> also.
>>
>> A failure to fence the failed node will cause cluster to be marked 
>> UNCLEAN.
>> To get over it, the following command needs to be invoked on the 
>> surviving node.
>>
>> pcs stonith confirm  --force
>>
>> This can be automated by hooking a recovery script, when the the 
>> Stonith resource ‘Timed Out’ event.
>> To be more specific, the Pacemaker Alerts can be used for watch for 
>> Stonith timeouts and failures.
>> In that script, all that’s essentially to be executed is the 
>> aforementioned command.
>>
>> If I get you right here you can disable fencing then in the first place.
>> Actually quorum-based-watchdog-fencing is the way to do this in a 
>> safe manner. This of course assumes you have a proper source for 
>> quorum in your 2-node-setup with e.g. qdevice or using a shared disk 
>> with sbd (not directly pacemaker quorum here but similar thing 
>> handled inside sbd).
>>
>>
>> Since the alerts are issued from ‘hacluster’ login, sudo permissions 
>> for ‘hacluster’ needs to be configured.
>>
>> Thanx.
>>
>>
>> From: Klaus Wenninger [mailto:kwenn...@redhat.com]
>> Sent: Monday, July 24, 2017 9:24 PM
>> To: Kristián Feldsam; Cluster Labs - All topics related to 
>> open-source clustering welcomed
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>
>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>> I personally t

Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot

Please ignore my re-reply to the original message, I'm in the middle of
a move and am getting by on little sleep at the moment :-)

On Mon, 2017-07-31 at 09:26 -0500, Ken Gaillot wrote:
> On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> > Hello,
> > 
> >  
> > 
> > We built a pacemaker cluster with 2 physical servers.
> > 
> > We configured DRBD in Master\Slave setup, a floating IP and file
> > system mount in Active\Passive mode.
> > 
> > We configured two STONITH devices (fence_ipmilan), one for each
> > server.
> > 
> >  
> > 
> > We are trying to simulate a situation when the Master server crushes
> > with no power. 
> > 
> > We pulled both of the PSU cables and the server becomes offline
> > (UNCLEAN).
> > 
> > The resources that the Master use to hold are now in Started (UNCLEAN)
> > state.
> > 
> > The state is unclean since the STONITH failed (the STONITH device is
> > located on the server (Intel RMM4 - IPMI) – which uses the same power
> > supply). 
> > 
> >  
> > 
> > The problem is that now, the cluster does not releasing the resources
> > that the Master holds, and the service goes down.
> > 
> >  
> > 
> > Is there any way to overcome this situation? 
> > 
> > We tried to add a qdevice but got the same results.
> > 
> >  
> > 
> > We are using pacemaker 1.1.15 on CentOS 7.3
> > 
> >  
> > 
> > Thanks,
> > 
> > Tomer.
> 
> This is a limitation of using IPMI as the only fence device, when the
> IPMI shares power with the main system. The way around it is to use a
> fallback fence device, for example a switched power unit or sbd
> (watchdog). Pacemaker lets you specify a fencing "topology" with
> multiple devices -- level 1 would be the IPMI, and level 2 would be the
> fallback device.
> 
> qdevice helps with quorum, which would let one side attempt to fence the
> other, but it doesn't affect whether the fencing succeeds. With a
> two-node cluster, you can use qdevice to get quorum, or you can use
> corosync's two_node option.
> 

-- 
Ken Gaillot 





___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-31 Thread Ken Gaillot

On Mon, 2017-07-24 at 11:51 +, Tomer Azran wrote:
> Hello,
> 
>  
> 
> We built a pacemaker cluster with 2 physical servers.
> 
> We configured DRBD in Master\Slave setup, a floating IP and file
> system mount in Active\Passive mode.
> 
> We configured two STONITH devices (fence_ipmilan), one for each
> server.
> 
>  
> 
> We are trying to simulate a situation when the Master server crushes
> with no power. 
> 
> We pulled both of the PSU cables and the server becomes offline
> (UNCLEAN).
> 
> The resources that the Master use to hold are now in Started (UNCLEAN)
> state.
> 
> The state is unclean since the STONITH failed (the STONITH device is
> located on the server (Intel RMM4 - IPMI) – which uses the same power
> supply). 
> 
>  
> 
> The problem is that now, the cluster does not releasing the resources
> that the Master holds, and the service goes down.
> 
>  
> 
> Is there any way to overcome this situation? 
> 
> We tried to add a qdevice but got the same results.
> 
>  
> 
> We are using pacemaker 1.1.15 on CentOS 7.3
> 
>  
> 
> Thanks,
> 
> Tomer.

This is a limitation of using IPMI as the only fence device, when the
IPMI shares power with the main system. The way around it is to use a
fallback fence device, for example a switched power unit or sbd
(watchdog). Pacemaker lets you specify a fencing "topology" with
multiple devices -- level 1 would be the IPMI, and level 2 would be the
fallback device.

qdevice helps with quorum, which would let one side attempt to fence the
other, but it doesn't affect whether the fencing succeeds. With a
two-node cluster, you can use qdevice to get quorum, or you can use
corosync's two_node option.

-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-30 Thread Tomer Azran

Just updating that I added another level of fencing using watchdog-fencing.
With the quorum device and this combination works in case of power failure of 
both server and ipmi interface.
An important note is that the stonith-watchdog-timeout must be configured in 
order to work.
After reading the following great post: 
http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit , I choose the softdog 
watchdog since the I don't think ipmi watchdog will do no good in case the ipmi 
interface is down (If it is OK it will be used as a fencing method).

Just for documenting the solution (in case someone else needed that), the 
configuration I added is:
systemctl enable sbd
pcs property set no-quorum-policy=suicide
pcs property set stonith-watchdog-timeout=15
pcs quorum device add model net host=qdevice algorithm=lms

I just can't decide if the qdevice algorithm should be lms or ffsplit. I 
couldn't determine the difference between them and I'm not sure which one is 
the best when using two node cluster with qdevice and watchdog fencing.

Can anyone advise on that?


From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Tuesday, July 25, 2017 2:19 AM
To: Tomer Azran ; Cluster Labs - All topics related to 
open-source clustering welcomed ; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 11:59 PM, Tomer Azran wrote:
There is a problem with that – it seems like SBD with shared disk is disabled 
on CentOS 7.3:

When I run:
# sbd -d /dev/sbd create

I get:
Shared disk functionality not supported

Which is why I suggested to go for watchdog-fencing using
your qdevice setup.
As said I haven't tried with qdevice-quorum - but I don't
see a reason why that shouldn't work.
no-quorum-policy has to be suicide of course.



So I might try the software watchdog (softgod or ipmi_watchdog)

A reliable watchdog is really crucial for sbd so I would
recommend going for ipmi or anything else that has
hardware behind.

Klaus


Tomer.

From: Tomer Azran [mailto:tomer.az...@edp.co.il]
Sent: Tuesday, July 25, 2017 12:30 AM
To: kwenn...@redhat.com<mailto:kwenn...@redhat.com>; Cluster Labs - All topics 
related to open-source clustering welcomed 
<mailto:users@clusterlabs.org>; Prasad, Shashank 
<mailto:sspra...@vanu.com>
Subject: Re: [ClusterLabs] Two nodes cluster issue

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.

That was of course with a certain degree of hyperbolism. Anything is of course 
better than not having
fencing at all.
I might be wrong but what you were saying somehow was drawing a picture in my 
mind that you
have your 2 nodes at 2 sites/rooms quite separated and in that case ...


I think I will try to use an iScsi target on my qdevice and set SBD to use it.
I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.

sbd with watchdog + qdevice can take the place of sbd with shared storage.
qdevice is there to decide which part of a cluster is quorate and which not - 
in cases
where after a split this wouldn't be possible.
sbd (with watchdog) is then there to reliably take down the non-quorate part
within a well defined time.



From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
mailto:users@clusterlabs.org>>; Prasad, Shashank 
mailto:sspra...@vanu.com>>
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it cannot be 
avoided.
In such scenarios the HA cluster is NOT able to handle the power failure of a 
node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons also.

A failure to fence the failed node will cause cluster to be marked UNCLEAN.
To get over it, the following command needs to be invoked on the surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the Stonith 
resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
timeouts and failures.
In that script, all that’s essentially to be executed is the aforementioned 
command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).


Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx

Re: [ClusterLabs] Two nodes cluster issue

2017-07-25 Thread Jan Friesse


Tomer Azran napsal(a):

I tend to agree with Klaus – I don't think that having a hook that
bypass stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD to
use it.
I still don't understand why qdevice can't take the place SBD with
shared storage; correct me if I'm wrong, but it looks like both of
them are there for the same reason.


Qdevice is there to be third side arbiter who decides which partition is
quorate. It can also be seen as a quorum only node. So for two node
cluster it can be viewed as a third node (eventho it is quite special
because it cannot run resources). It is not doing fencing.

SBD is fencing device. It is using disk as a third side arbiter.


I've talked with Klaus and he told me that 7.3 is not using disk as a 
third side arbiter so sorry for confusion.


You should however still be able to use sbd for checking if pacemaker is 
alive and if the partition has quorum - otherwise the watchdog kills the 
node. So qdevice will give you "3rd" node and sbd fences unquorate 
partition.


Or (as mentioned previously) you can use fabric fencing.

Regards,
  Honza






From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering
welcomed ; Prasad, Shashank 
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it
cannot be avoided.
In such scenarios the HA cluster is NOT able to handle the power
failure of a node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons
also.

A failure to fence the failed node will cause cluster to be marked
UNCLEAN.
To get over it, the following command needs to be invoked on the
surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the
Stonith resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for
Stonith timeouts and failures.
In that script, all that’s essentially to be executed is the
aforementioned command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).


Since the alerts are issued from ‘hacluster’ login, sudo permissions
for ‘hacluster’ needs to be configured.

Thanx.


From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe,
or not?

True if that is working in you environment. If you can't do a physical
setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.




S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz>

www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální
hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

On 24 Jul 2017, at 17:27, Klaus Wenninger
mailto:kwenn...@redhat.com>> wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on this
situation. Since the master node is down, I would expect the quorum to
declare it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to
the quorate subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the
non-quorate part
of the cluster are down.






On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk"
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:


We don't have the ability to use it.



Is that the only solution?




No, but I'd recommend thinking about it first. Are you sure you will

care about your cluster working when your server room is on fire? 'Cause

unless you have halon suppression, your server room is a

Re: [ClusterLabs] Two nodes cluster issue

2017-07-25 Thread Jan Friesse


Tomer Azran napsal(a):

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD to use it.
I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.


Qdevice is there to be third side arbiter who decides which partition is 
quorate. It can also be seen as a quorum only node. So for two node 
cluster it can be viewed as a third node (eventho it is quite special 
because it cannot run resources). It is not doing fencing.


SBD is fencing device. It is using disk as a third side arbiter.




From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; Prasad, Shashank 
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it cannot be 
avoided.
In such scenarios the HA cluster is NOT able to handle the power failure of a 
node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons also.

A failure to fence the failed node will cause cluster to be marked UNCLEAN.
To get over it, the following command needs to be invoked on the surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the Stonith 
resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
timeouts and failures.
In that script, all that’s essentially to be executed is the aforementioned 
command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).


Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx.


From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe, or not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.




S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz>

www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové 
a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

On 24 Jul 2017, at 17:27, Klaus Wenninger 
mailto:kwenn...@redhat.com>> wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on this 
situation. Since the master node is down, I would expect the quorum to declare 
it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the quorate 
subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the non-quorate part
of the cluster are down.






On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:


We don't have the ability to use it.



Is that the only solution?




No, but I'd recommend thinking about it first. Are you sure you will

care about your cluster working when your server room is on fire? 'Cause

unless you have halon suppression, your server room is a complete

write-off anyway. (Think water from sprinklers hitting rich chunky volts

in the servers.)



Dima



___

Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users



Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>

Getting started: http://www.c

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Prasad, Shashank

> I don't think that having a hook that bypass stonith is the right way….

 

The intention is NOT to bypass STONITH. STONITH shall always remain active, and 
an integral part of the cluster. The discussion is about bailing out of 
situations when the STONITH itself fails due to fencing agent failures, and how 
one can automate the process of bailing out.

 

All that the surviving nodes in the cluster need to be informed is that the 
failed node has indeed failed, and therefore the suggestion for a hook.

 

The hook (lets’ say: STONITH-Failure-Recovery-Hook) under discussion will only 
be fired when Fencing Agent fails. STONITH-Failure-Recovery-Hook is realized 
via a script. The "${CRM_alert_rsc}" , "${CRM_alert_task}",   
"${CRM_alert_desc}"  "${CRM_alert_node}" in the Pacemaker Alert can use used to 
match up with STONITH resource and its failures, and invoke the 
STONITH-Failure-Recovery-Hook as appropriate.

 

I also agree with Klaus that a quorum device is a good strategy.

That needs 3rd node in the cluster. If such an option can be exercised, it 
should be.

 

Thanx.

 

 

 

From: Tomer Azran [mailto:tomer.az...@edp.co.il] 
Sent: Tuesday, July 25, 2017 3:00 AM
To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source 
clustering welcomed; Prasad, Shashank
Subject: RE: [ClusterLabs] Two nodes cluster issue

 

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.

I think I will try to use an iScsi target on my qdevice and set SBD to use it.

I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.

 

From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; Prasad, Shashank 
Subject: Re: [ClusterLabs] Two nodes cluster issue

 

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:

Sometimes IPMI fence devices use shared power of the node, and it 
cannot be avoided.

In such scenarios the HA cluster is NOT able to handle the power 
failure of a node, since the power is shared with its own fence device.

The failure of IPMI based fencing can also exist due to other reasons 
also.

 

A failure to fence the failed node will cause cluster to be marked 
UNCLEAN.

To get over it, the following command needs to be invoked on the 
surviving node.

 

pcs stonith confirm  --force

 

This can be automated by hooking a recovery script, when the the 
Stonith resource ‘Timed Out’ event.

To be more specific, the Pacemaker Alerts can be used for watch for 
Stonith timeouts and failures.

In that script, all that’s essentially to be executed is the 
aforementioned command.


If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).



Since the alerts are issued from ‘hacluster’ login, sudo permissions 
for ‘hacluster’ needs to be configured.

 

Thanx.

 

 

From: Klaus Wenninger [mailto:kwenn...@redhat.com 
<mailto:kwenn...@redhat.com> ] 
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

 

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:

I personally think that power off node by switched pdu is more 
safe, or not?


True if that is working in you environment. If you can't do a physical 
setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.





S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz> 

www.feldhost.cz <http://www.feldhost.cz>  - FeldHost™ – profesionální 
hostingové a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 11:59 PM, Tomer Azran wrote:
>
> There is a problem with that – it seems like SBD with shared disk is
> disabled on CentOS 7.3:
>
>  
>
> When I run:
>
> # sbd -d /dev/sbd create
>
>  
>
> I get:
>
> Shared disk functionality not supported
>

Which is why I suggested to go for watchdog-fencing using
your qdevice setup.
As said I haven't tried with qdevice-quorum - but I don't
see a reason why that shouldn't work.
no-quorum-policy has to be suicide of course.

>  
>
> So I might try the software watchdog (softgod or ipmi_watchdog)
>

A reliable watchdog is really crucial for sbd so I would
recommend going for ipmi or anything else that has
hardware behind.

Klaus
>
>  
>
> Tomer.
>
>  
>
> *From:*Tomer Azran [mailto:tomer.az...@edp.co.il]
> *Sent:* Tuesday, July 25, 2017 12:30 AM
> *To:* kwenn...@redhat.com; Cluster Labs - All topics related to
> open-source clustering welcomed ; Prasad,
> Shashank 
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> I tend to agree with Klaus – I don't think that having a hook that
> bypass stonith is the right way. It is better to not use stonith at all.
>

That was of course with a certain degree of hyperbolism. Anything is of
course better than not having
fencing at all.
I might be wrong but what you were saying somehow was drawing a picture
in my mind that you
have your 2 nodes at 2 sites/rooms quite separated and in that case ...

> I think I will try to use an iScsi target on my qdevice and set SBD to
> use it.
>
> I still don't understand why qdevice can't take the place SBD with
> shared storage; correct me if I'm wrong, but it looks like both of
> them are there for the same reason.
>

sbd with watchdog + qdevice can take the place of sbd with shared storage.
qdevice is there to decide which part of a cluster is quorate and which
not - in cases
where after a split this wouldn't be possible.
sbd (with watchdog) is then there to reliably take down the non-quorate part
within a well defined time.

>  
>
> *From:*Klaus Wenninger [mailto:kwenn...@redhat.com]
> *Sent:* Monday, July 24, 2017 9:01 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed mailto:users@clusterlabs.org>>;
> Prasad, Shashank mailto:sspra...@vanu.com>>
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
> Sometimes IPMI fence devices use shared power of the node, and it
> cannot be avoided.
>
> In such scenarios the HA cluster is NOT able to handle the power
> failure of a node, since the power is shared with its own fence
> device.
>
> The failure of IPMI based fencing can also exist due to other
> reasons also.
>
>  
>
> A failure to fence the failed node will cause cluster to be marked
> UNCLEAN.
>
> To get over it, the following command needs to be invoked on the
> surviving node.
>
>  
>
> pcs stonith confirm  --force
>
>  
>
> This can be automated by hooking a recovery script, when the the
> Stonith resource ‘Timed Out’ event.
>
> To be more specific, the Pacemaker Alerts can be used for watch
> for Stonith timeouts and failures.
>
> In that script, all that’s essentially to be executed is the
> aforementioned command.
>
>
> If I get you right here you can disable fencing then in the first place.
> Actually quorum-based-watchdog-fencing is the way to do this in a
> safe manner. This of course assumes you have a proper source for
> quorum in your 2-node-setup with e.g. qdevice or using a shared
> disk with sbd (not directly pacemaker quorum here but similar thing
> handled inside sbd).
>
> Since the alerts are issued from ‘hacluster’ login, sudo
> permissions for ‘hacluster’ needs to be configured.
>
>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenn...@redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
> I personally think that power off node by switched pdu is more
> safe, or not?
>
>
> True if that is working in you environment. If you can't do a
> physical setup
> where you aren't simultaneously loosing connection to both your
> node and
> the switch-device (or you just want to cover cases where that happens)
> you have t

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Ken Gaillot

On Mon, 2017-07-24 at 21:29 +, Tomer Azran wrote:
> I tend to agree with Klaus – I don't think that having a hook that
> bypass stonith is the right way. It is better to not use stonith at
> all.
> 
> I think I will try to use an iScsi target on my qdevice and set SBD to
> use it.

Certainly, two levels of real stonith is best -- but (at the risk of
committing heresy) I can see Shashank's point.

If you've got extensive redundancy everywhere else, then the chances of
something taking out both the node and its IPMI, yet still allowing it
to interfere with shared resources, is very small. It comes down to
whether you're willing to accept that small risk. Such a setup is
definitely better than disabling fencing altogether, because the IPMI
fence level safely handles all node failure scenarios that don't also
take out the IPMI (and the bypass actually handles a complete power cut
safely).

If you can do a second level of real fencing, that is of course
preferred.

> I still don't understand why qdevice can't take the place SBD with
> shared storage; correct me if I'm wrong, but it looks like both of
> them are there for the same reason.
> 
>  
> 
> From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
> Sent: Monday, July 24, 2017 9:01 PM
> To: Cluster Labs - All topics related to open-source clustering
> welcomed ; Prasad, Shashank 
> Subject: Re: [ClusterLabs] Two nodes cluster issue
> 
> 
>  
> 
> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
> 
> 
> Sometimes IPMI fence devices use shared power of the node, and
> it cannot be avoided.
> 
> In such scenarios the HA cluster is NOT able to handle the
> power failure of a node, since the power is shared with its
> own fence device.
> 
> The failure of IPMI based fencing can also exist due to other
> reasons also.
> 
>  
> 
> A failure to fence the failed node will cause cluster to be
> marked UNCLEAN.
> 
> To get over it, the following command needs to be invoked on
> the surviving node.
> 
>  
> 
> pcs stonith confirm  --force
> 
>  
> 
> This can be automated by hooking a recovery script, when the
> the Stonith resource ‘Timed Out’ event.
> 
> To be more specific, the Pacemaker Alerts can be used for
> watch for Stonith timeouts and failures.
> 
> In that script, all that’s essentially to be executed is the
> aforementioned command.
> 
> 
> 
> If I get you right here you can disable fencing then in the first
> place.
> Actually quorum-based-watchdog-fencing is the way to do this in a
> safe manner. This of course assumes you have a proper source for
> quorum in your 2-node-setup with e.g. qdevice or using a shared
> disk with sbd (not directly pacemaker quorum here but similar thing
> handled inside sbd).
> 
> 
> 
> 
> Since the alerts are issued from ‘hacluster’ login, sudo
> permissions for ‘hacluster’ needs to be configured.
> 
>  
> 
> Thanx.
> 
>  
> 
>      
>     
> From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
> Sent: Monday, July 24, 2017 9:24 PM
> To: Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> Subject: Re: [ClusterLabs] Two nodes cluster issue
> 
> 
>  
> 
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
> 
> 
> I personally think that power off node by switched pdu
> is more safe, or not?
> 
> 
> 
> True if that is working in you environment. If you can't do a
> physical setup
> where you aren't simultaneously loosing connection to both
> your node and
> the switch-device (or you just want to cover cases where that
> happens)
> you have to come up with something else.
> 
> 
> 
> 
> 
> 
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz
> 
> www.feldhost.cz - FeldHost™ – profesionální hostingové a
> serverové služby za adekvátní ceny.
> 
> FELDSAM s.r.o.
> V rohu 434/3
>

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

There is a problem with that – it seems like SBD with shared disk is disabled 
on CentOS 7.3:

When I run:
# sbd -d /dev/sbd create

I get:
Shared disk functionality not supported

So I might try the software watchdog (softgod or ipmi_watchdog)

Tomer.

From: Tomer Azran [mailto:tomer.az...@edp.co.il]
Sent: Tuesday, July 25, 2017 12:30 AM
To: kwenn...@redhat.com; Cluster Labs - All topics related to open-source 
clustering welcomed ; Prasad, Shashank 

Subject: Re: [ClusterLabs] Two nodes cluster issue

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD to use it.
I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
mailto:users@clusterlabs.org>>; Prasad, Shashank 
mailto:sspra...@vanu.com>>
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it cannot be 
avoided.
In such scenarios the HA cluster is NOT able to handle the power failure of a 
node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons also.

A failure to fence the failed node will cause cluster to be marked UNCLEAN.
To get over it, the following command needs to be invoked on the surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the Stonith 
resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
timeouts and failures.
In that script, all that’s essentially to be executed is the aforementioned 
command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe, or not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz>

www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové 
a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

On 24 Jul 2017, at 17:27, Klaus Wenninger 
mailto:kwenn...@redhat.com>> wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on this 
situation. Since the master node is down, I would expect the quorum to declare 
it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the quorate 
subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the non-quorate part
of the cluster are down.

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:

> We don't have the ability to use it.

> Is that the only solution?

No, but I'd recommend thinking about it first. Are you sure you will

care about your cluster working when your server room is on fire? 'Cause

unless you have halon suppression, your server room is a complete

write-off anyway. (Think water from sprinklers hitting rich chunky volts

in the servers.)

Dima

___

Users mailing list: Users@clusterlabs.org<mailto:Users@clu

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

I tend to agree with Klaus – I don't think that having a hook that bypass 
stonith is the right way. It is better to not use stonith at all.
I think I will try to use an iScsi target on my qdevice and set SBD to use it.
I still don't understand why qdevice can't take the place SBD with shared 
storage; correct me if I'm wrong, but it looks like both of them are there for 
the same reason.

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:01 PM
To: Cluster Labs - All topics related to open-source clustering welcomed 
; Prasad, Shashank 
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
Sometimes IPMI fence devices use shared power of the node, and it cannot be 
avoided.
In such scenarios the HA cluster is NOT able to handle the power failure of a 
node, since the power is shared with its own fence device.
The failure of IPMI based fencing can also exist due to other reasons also.

A failure to fence the failed node will cause cluster to be marked UNCLEAN.
To get over it, the following command needs to be invoked on the surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the Stonith 
resource ‘Timed Out’ event.
To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
timeouts and failures.
In that script, all that’s essentially to be executed is the aforementioned 
command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com]
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
I personally think that power off node by switched pdu is more safe, or not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz>

www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové 
a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

On 24 Jul 2017, at 17:27, Klaus Wenninger 
mailto:kwenn...@redhat.com>> wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on this 
situation. Since the master node is down, I would expect the quorum to declare 
it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the quorate 
subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the non-quorate part
of the cluster are down.

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:

> We don't have the ability to use it.

> Is that the only solution?

No, but I'd recommend thinking about it first. Are you sure you will

care about your cluster working when your server room is on fire? 'Cause

unless you have halon suppression, your server room is a complete

write-off anyway. (Think water from sprinklers hitting rich chunky volts

in the servers.)

Dima

___

Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>

Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf

Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>

___

Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org>

http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>

Getting started: ht

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Kristián Feldsam

yes, I just have idea, he probably have managed switch or fabric...

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 22:18, Klaus Wenninger  wrote:
> 
> On 07/24/2017 09:46 PM, Kristián Feldsam wrote:
>> so why to use some other fencing method like disablink port on switch, so 
>> nobody can acces faultly node and write data to it. it is common practice 
>> too.
> 
> Well don't get me wrong here. I don't want to hard-sell sbd.
> Just though that very likely requirements that prevent usage
> of a remote-controlled power-switch will make access
> to a switch to disable the ports unusable as well.
> And if a working qdevice setup is there already the gap between
> what he thought he would get from qdevice and what he actually
> had just matches exactly quorum-based-watchdog-fencing.
> 
> But you are of course right.
> I don't really know the scenario.
> Maybe fabric fencing is the perfect match - good to mention it
> here as a possibility.
> 
> Regards,
> Klaus
>   
>> 
>> S pozdravem Kristián Feldsam
>> Tel.: +420 773 303 353, +421 944 137 535
>> E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz>
>> 
>> www.feldhost.cz <http://www.feldhost.cz/> - FeldHost™ – profesionální 
>> hostingové a serverové služby za adekvátní ceny.
>> 
>> FELDSAM s.r.o.
>> V rohu 434/3
>> Praha 4 – Libuš, PSČ 142 00
>> IČ: 290 60 958, DIČ: CZ290 60 958
>> C 200350 vedená u Městského soudu v Praze
>> 
>> Banka: Fio banka a.s.
>> Číslo účtu: 2400330446/2010
>> BIC: FIOBCZPPXX
>> IBAN: CZ82 2010  0024 0033 0446
>> 
>>> On 24 Jul 2017, at 21:16, Klaus Wenninger >> <mailto:kwenn...@redhat.com>> wrote:
>>> 
>>> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>>>> My understanding is that  SBD will need a shared storage between clustered 
>>>> nodes.
>>>> And that, SBD will need at least 3 nodes in a cluster, if using w/o shared 
>>>> storage.
>>> 
>>> Haven't tried to be honest but reason for 3 nodes is that without
>>> shared disk you need a real quorum-source and not something
>>> 'faked' as with 2-node-feature in corosync.
>>> But I don't see anything speaking against getting the proper
>>> quorum via qdevice instead with a third full cluster-node.
>>> 
>>>>  
>>>> Therefore, for systems which do NOT use shared storage between 1+1 HA 
>>>> clustered nodes, SBD may NOT be an option.
>>>> Correct me, if I am wrong.
>>>>  
>>>> For cluster systems using the likes of iDRAC/IMM2 fencing agents, which 
>>>> have redundant but shared power supply units with the nodes, the normal 
>>>> fencing mechanisms should work for all resiliency scenarios, but for 
>>>> IMM2/iDRAC are being NOT reachable for whatsoever reasons. And, to bail 
>>>> out of those situations in the absence of SBD, I believe using 
>>>> used-defined failover hooks (via scripts) into Pacemaker Alerts, with sudo 
>>>> permissions for ‘hacluster’, should help.
>>> 
>>> If you don't see your fencing device assuming after some time
>>> the the corresponding node will probably be down is quite risky
>>> in my opinion.
>>> But why not assure it to be down using a watchdog?
>>> 
>>>>  
>>>> Thanx.
>>>>  
>>>>  
>>>> From: Klaus Wenninger [mailto:kwenn...@redhat.com 
>>>> <mailto:kwenn...@redhat.com>] 
>>>> Sent: Monday, July 24, 2017 11:31 PM
>>>> To: Cluster Labs - All topics related to open-source clustering welcomed; 
>>>> Prasad, Shashank
>>>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>>>  
>>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>>> Sometimes IPMI fence devices use shared power of the node, and it cannot 
>>>> be avoided.
>>>> In such scenarios the HA cluster is NOT able to handle the power failure 
>>>> of a node, since the power is shared with its own fence device.
>>>> The failure of IPMI based fencing can also exist due to other reasons also.
>>

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 09:46 PM, Kristián Feldsam wrote:
> so why to use some other fencing method like disablink port on switch,
> so nobody can acces faultly node and write data to it. it is common
> practice too.

Well don't get me wrong here. I don't want to hard-sell sbd.
Just though that very likely requirements that prevent usage
of a remote-controlled power-switch will make access
to a switch to disable the ports unusable as well.
And if a working qdevice setup is there already the gap between
what he thought he would get from qdevice and what he actually
had just matches exactly quorum-based-watchdog-fencing.

But you are of course right.
I don't really know the scenario.
Maybe fabric fencing is the perfect match - good to mention it
here as a possibility.

Regards,
Klaus
 
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - FeldHost™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
>
>> On 24 Jul 2017, at 21:16, Klaus Wenninger > <mailto:kwenn...@redhat.com>> wrote:
>>
>> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>>> My understanding is that  SBD will need a shared storage between
>>> clustered nodes.
>>> And that, SBD will need at least 3 nodes in a cluster, if using w/o
>>> shared storage.
>>
>> Haven't tried to be honest but reason for 3 nodes is that without
>> shared disk you need a real quorum-source and not something
>> 'faked' as with 2-node-feature in corosync.
>> But I don't see anything speaking against getting the proper
>> quorum via qdevice instead with a third full cluster-node.
>>
>>>  
>>> Therefore, for systems which do NOT use shared storage between 1+1
>>> HA clustered nodes, SBD may NOT be an option.
>>> Correct me, if I am wrong.
>>>  
>>> For cluster systems using the likes of iDRAC/IMM2 fencing agents,
>>> which have redundant but shared power supply units with the nodes,
>>> the normal fencing mechanisms should work for all resiliency
>>> scenarios, but for IMM2/iDRAC are being NOT reachable for whatsoever
>>> reasons. And, to bail out of those situations in the absence of SBD,
>>> I believe using used-defined failover hooks (via scripts) into
>>> Pacemaker Alerts, with sudo permissions for ‘hacluster’, should help.
>>
>> If you don't see your fencing device assuming after some time
>> the the corresponding node will probably be down is quite risky
>> in my opinion.
>> But why not assure it to be down using a watchdog?
>>
>>>  
>>> Thanx.
>>>  
>>>  
>>> *From:* Klaus Wenninger [mailto:kwenn...@redhat.com] 
>>> *Sent:* Monday, July 24, 2017 11:31 PM
>>> *To:* Cluster Labs - All topics related to open-source clustering
>>> welcomed; Prasad, Shashank
>>> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>>>  
>>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>>>
>>> Sometimes IPMI fence devices use shared power of the node, and
>>> it cannot be avoided.
>>> In such scenarios the HA cluster is NOT able to handle the power
>>> failure of a node, since the power is shared with its own fence
>>> device.
>>> The failure of IPMI based fencing can also exist due to other
>>> reasons also.
>>>  
>>> A failure to fence the failed node will cause cluster to be
>>> marked UNCLEAN.
>>> To get over it, the following command needs to be invoked on the
>>> surviving node.
>>>  
>>> pcs stonith confirm  --force
>>>  
>>> This can be automated by hooking a recovery script, when the the
>>> Stonith resource ‘Timed Out’ event.
>>> To be more specific, the Pacemaker Alerts can be used for watch
>>> for Stonith timeouts and failures.
>>> In that script, all that’s essentially to be executed is the
>>> aforementioned command.
>>>
>>>
>>> If I get you right here you can disable fencing then in the first place.
>>> Actually quorum-based-watchdog-fencing is the way to do this in a
>>> safe manner. This of course assum

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Kristián Feldsam

so why to use some other fencing method like disablink port on switch, so 
nobody can acces faultly node and write data to it. it is common practice too.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 21:16, Klaus Wenninger  wrote:
> 
> On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>> My understanding is that  SBD will need a shared storage between clustered 
>> nodes.
>> And that, SBD will need at least 3 nodes in a cluster, if using w/o shared 
>> storage.
> 
> Haven't tried to be honest but reason for 3 nodes is that without
> shared disk you need a real quorum-source and not something
> 'faked' as with 2-node-feature in corosync.
> But I don't see anything speaking against getting the proper
> quorum via qdevice instead with a third full cluster-node.
> 
>>  
>> Therefore, for systems which do NOT use shared storage between 1+1 HA 
>> clustered nodes, SBD may NOT be an option.
>> Correct me, if I am wrong.
>>  
>> For cluster systems using the likes of iDRAC/IMM2 fencing agents, which have 
>> redundant but shared power supply units with the nodes, the normal fencing 
>> mechanisms should work for all resiliency scenarios, but for IMM2/iDRAC are 
>> being NOT reachable for whatsoever reasons. And, to bail out of those 
>> situations in the absence of SBD, I believe using used-defined failover 
>> hooks (via scripts) into Pacemaker Alerts, with sudo permissions for 
>> ‘hacluster’, should help.
> 
> If you don't see your fencing device assuming after some time
> the the corresponding node will probably be down is quite risky
> in my opinion.
> But why not assure it to be down using a watchdog?
> 
>>  
>> Thanx.
>>  
>>  
>> From: Klaus Wenninger [mailto:kwenn...@redhat.com 
>> <mailto:kwenn...@redhat.com>] 
>> Sent: Monday, July 24, 2017 11:31 PM
>> To: Cluster Labs - All topics related to open-source clustering welcomed; 
>> Prasad, Shashank
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>  
>> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>> Sometimes IPMI fence devices use shared power of the node, and it cannot be 
>> avoided.
>> In such scenarios the HA cluster is NOT able to handle the power failure of 
>> a node, since the power is shared with its own fence device.
>> The failure of IPMI based fencing can also exist due to other reasons also.
>>  
>> A failure to fence the failed node will cause cluster to be marked UNCLEAN.
>> To get over it, the following command needs to be invoked on the surviving 
>> node.
>>  
>> pcs stonith confirm  --force
>>  
>> This can be automated by hooking a recovery script, when the the Stonith 
>> resource ‘Timed Out’ event.
>> To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
>> timeouts and failures.
>> In that script, all that’s essentially to be executed is the aforementioned 
>> command.
>> 
>> If I get you right here you can disable fencing then in the first place.
>> Actually quorum-based-watchdog-fencing is the way to do this in a
>> safe manner. This of course assumes you have a proper source for
>> quorum in your 2-node-setup with e.g. qdevice or using a shared
>> disk with sbd (not directly pacemaker quorum here but similar thing
>> handled inside sbd).
>> 
>> 
>> Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
>> ‘hacluster’ needs to be configured.
>>  
>> Thanx.
>>  
>>  
>> From: Klaus Wenninger [mailto:kwenn...@redhat.com 
>> <mailto:kwenn...@redhat.com>] 
>> Sent: Monday, July 24, 2017 9:24 PM
>> To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
>> clustering welcomed
>> Subject: Re: [ClusterLabs] Two nodes cluster issue
>>  
>> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>> I personally think that power off node by switched pdu is more safe, or not?
>> 
>> True if that is working in you environment. If you can't do a physical setup
>> where you aren't simultaneously loosing connection to both your node and
>> the switch-device (or you just want to cover cases where that happens)
>> you have to come up with something else.
&

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 08:27 PM, Prasad, Shashank wrote:
>
> My understanding is that  SBD will need a shared storage between
> clustered nodes.
>
> And that, SBD will need at least 3 nodes in a cluster, if using w/o
> shared storage.
>

Haven't tried to be honest but reason for 3 nodes is that without
shared disk you need a real quorum-source and not something
'faked' as with 2-node-feature in corosync.
But I don't see anything speaking against getting the proper
quorum via qdevice instead with a third full cluster-node.

>  
>
> Therefore, for systems which do NOT use shared storage between 1+1 HA
> clustered nodes, SBD may NOT be an option.
>
> Correct me, if I am wrong.
>
>  
>
> For cluster systems using the likes of iDRAC/IMM2 fencing agents,
> which have redundant but shared power supply units with the nodes, the
> normal fencing mechanisms should work for all resiliency scenarios,
> but for IMM2/iDRAC are being NOT reachable for whatsoever reasons.
> And, to bail out of those situations in the absence of SBD, I believe
> using used-defined failover hooks (via scripts) into Pacemaker Alerts,
> with sudo permissions for ‘hacluster’, should help.
>

If you don't see your fencing device assuming after some time
the the corresponding node will probably be down is quite risky
in my opinion.
But why not assure it to be down using a watchdog?

>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenn...@redhat.com]
> *Sent:* Monday, July 24, 2017 11:31 PM
> *To:* Cluster Labs - All topics related to open-source clustering
> welcomed; Prasad, Shashank
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
> Sometimes IPMI fence devices use shared power of the node, and it
> cannot be avoided.
>
> In such scenarios the HA cluster is NOT able to handle the power
> failure of a node, since the power is shared with its own fence
> device.
>
> The failure of IPMI based fencing can also exist due to other
> reasons also.
>
>  
>
> A failure to fence the failed node will cause cluster to be marked
> UNCLEAN.
>
> To get over it, the following command needs to be invoked on the
> surviving node.
>
>  
>
> pcs stonith confirm  --force
>
>  
>
> This can be automated by hooking a recovery script, when the the
> Stonith resource ‘Timed Out’ event.
>
> To be more specific, the Pacemaker Alerts can be used for watch
> for Stonith timeouts and failures.
>
> In that script, all that’s essentially to be executed is the
> aforementioned command.
>
>
> If I get you right here you can disable fencing then in the first place.
> Actually quorum-based-watchdog-fencing is the way to do this in a
> safe manner. This of course assumes you have a proper source for
> quorum in your 2-node-setup with e.g. qdevice or using a shared
> disk with sbd (not directly pacemaker quorum here but similar thing
> handled inside sbd).
>
>
> Since the alerts are issued from ‘hacluster’ login, sudo permissions
> for ‘hacluster’ needs to be configured.
>
>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenn...@redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
> I personally think that power off node by switched pdu is more
> safe, or not?
>
>
> True if that is working in you environment. If you can't do a physical
> setup
> where you aren't simultaneously loosing connection to both your node and
> the switch-device (or you just want to cover cases where that happens)
> you have to come up with something else.
>
>
>
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
>
>  
>
> On 24 Jul 2017, at 17:27, Klaus Wenninger  <mailto:kwenn...@redhat.com>> wrote:
>
>  
>
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
>

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Prasad, Shashank

My understanding is that  SBD will need a shared storage between clustered 
nodes.

And that, SBD will need at least 3 nodes in a cluster, if using w/o shared 
storage.

Therefore, for systems which do NOT use shared storage between 1+1 HA clustered 
nodes, SBD may NOT be an option.

Correct me, if I am wrong.

For cluster systems using the likes of iDRAC/IMM2 fencing agents, which have 
redundant but shared power supply units with the nodes, the normal fencing 
mechanisms should work for all resiliency scenarios, but for IMM2/iDRAC are 
being NOT reachable for whatsoever reasons. And, to bail out of those 
situations in the absence of SBD, I believe using used-defined failover hooks 
(via scripts) into Pacemaker Alerts, with sudo permissions for ‘hacluster’, 
should help.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
Sent: Monday, July 24, 2017 11:31 PM
To: Cluster Labs - All topics related to open-source clustering welcomed; 
Prasad, Shashank
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:

Sometimes IPMI fence devices use shared power of the node, and it 
cannot be avoided.

In such scenarios the HA cluster is NOT able to handle the power 
failure of a node, since the power is shared with its own fence device.

The failure of IPMI based fencing can also exist due to other reasons 
also.

A failure to fence the failed node will cause cluster to be marked 
UNCLEAN.

To get over it, the following command needs to be invoked on the 
surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the 
Stonith resource ‘Timed Out’ event.

To be more specific, the Pacemaker Alerts can be used for watch for 
Stonith timeouts and failures.

In that script, all that’s essentially to be executed is the 
aforementioned command.

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:

I personally think that power off node by switched pdu is more safe, or 
not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446 

On 24 Jul 2017, at 17:27, Klaus Wenninger  wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:

I still don't understand why the qdevice concept doesn't help 
on this situation. Since the master node is down, I would expect the quorum to 
declare it as dead.

Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the 
quorate subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the 
non-quorate part
of the cluster are down.

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
 wrote:

On 2017-07-24 07:51, Tomer Azran wrote:
> We don't have the ability to use it.
> Is that the only solution?

No, but I'd recommend thinking about it first. Are you sure you will 
care about your cluster working when your server room is on fire? 
'Cause 
unless you have halon suppression, your server room is a complete 
write-off anyway. (Think water from sprinklers hitting rich chunky

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 07:32 PM, Prasad, Shashank wrote:
>
> Sometimes IPMI fence devices use shared power of the node, and it
> cannot be avoided.
>
> In such scenarios the HA cluster is NOT able to handle the power
> failure of a node, since the power is shared with its own fence device.
>
> The failure of IPMI based fencing can also exist due to other reasons
> also.
>
>  
>
> A failure to fence the failed node will cause cluster to be marked
> UNCLEAN.
>
> To get over it, the following command needs to be invoked on the
> surviving node.
>
>  
>
> pcs stonith confirm  --force
>
>  
>
> This can be automated by hooking a recovery script, when the the
> Stonith resource ‘Timed Out’ event.
>
> To be more specific, the Pacemaker Alerts can be used for watch for
> Stonith timeouts and failures.
>
> In that script, all that’s essentially to be executed is the
> aforementioned command.
>

If I get you right here you can disable fencing then in the first place.
Actually quorum-based-watchdog-fencing is the way to do this in a
safe manner. This of course assumes you have a proper source for
quorum in your 2-node-setup with e.g. qdevice or using a shared
disk with sbd (not directly pacemaker quorum here but similar thing
handled inside sbd).

> Since the alerts are issued from ‘hacluster’ login, sudo permissions
> for ‘hacluster’ needs to be configured.
>
>  
>
> Thanx.
>
>  
>
>  
>
> *From:*Klaus Wenninger [mailto:kwenn...@redhat.com]
> *Sent:* Monday, July 24, 2017 9:24 PM
> *To:* Kristián Feldsam; Cluster Labs - All topics related to
> open-source clustering welcomed
> *Subject:* Re: [ClusterLabs] Two nodes cluster issue
>
>  
>
> On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
>
> I personally think that power off node by switched pdu is more
> safe, or not?
>
>
> True if that is working in you environment. If you can't do a physical
> setup
> where you aren't simultaneously loosing connection to both your node and
> the switch-device (or you just want to cover cases where that happens)
> you have to come up with something else.
>
>
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz <mailto:supp...@feldhost.cz>
>
> www.feldhost.cz <http://www.feldhost.cz> - *Feld*Host™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
>
>  
>
> On 24 Jul 2017, at 17:27, Klaus Wenninger  <mailto:kwenn...@redhat.com>> wrote:
>
>  
>
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>
> I still don't understand why the qdevice concept doesn't help
> on this situation. Since the master node is down, I would
> expect the quorum to declare it as dead.
>
> Why doesn't it happens?
>
>
> That is not how quorum works. It just limits the decision-making
> to the quorate subset of the cluster.
> Still the unknown nodes are not sure to be down.
> That is why I suggested to have quorum-based watchdog-fencing with
> sbd.
> That would assure that within a certain time all nodes of the
> non-quorate part
> of the cluster are down.
>
>
>
>
> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
> Maziuk"  <mailto:dmitri.maz...@gmail.com>> wrote:
>
> On 2017-07-24 07:51, Tomer Azran wrote:
>
> > We don't have the ability to use it.
>
> > Is that the only solution?
>
>  
>
> No, but I'd recommend thinking about it first. Are you sure you will 
>
> care about your cluster working when your server room is on fire? 'Cause 
>
> unless you have halon suppression, your server room is a complete 
>
> write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>
> in the servers.)
>
>  
>
> Dima
>
>  
>
> ___
>
> Users mailing list: Users@clusterlabs.org <mailto:Users@clusterlabs.org>
>
> http://lists.clusterlabs.org/mailman/listinfo/users
>
>  
>
> Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
>
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>
> Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/&g

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Prasad, Shashank

Sometimes IPMI fence devices use shared power of the node, and it cannot be 
avoided.

In such scenarios the HA cluster is NOT able to handle the power failure of a 
node, since the power is shared with its own fence device.

The failure of IPMI based fencing can also exist due to other reasons also.

A failure to fence the failed node will cause cluster to be marked UNCLEAN.

To get over it, the following command needs to be invoked on the surviving node.

pcs stonith confirm  --force

This can be automated by hooking a recovery script, when the the Stonith 
resource ‘Timed Out’ event.

To be more specific, the Pacemaker Alerts can be used for watch for Stonith 
timeouts and failures.

In that script, all that’s essentially to be executed is the aforementioned 
command.

Since the alerts are issued from ‘hacluster’ login, sudo permissions for 
‘hacluster’ needs to be configured.

Thanx.

From: Klaus Wenninger [mailto:kwenn...@redhat.com] 
Sent: Monday, July 24, 2017 9:24 PM
To: Kristián Feldsam; Cluster Labs - All topics related to open-source 
clustering welcomed
Subject: Re: [ClusterLabs] Two nodes cluster issue

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:

I personally think that power off node by switched pdu is more safe, or 
not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446 

On 24 Jul 2017, at 17:27, Klaus Wenninger  wrote:

On 07/24/2017 05:15 PM, Tomer Azran wrote:

I still don't understand why the qdevice concept doesn't help 
on this situation. Since the master node is down, I would expect the quorum to 
declare it as dead.

Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the 
quorate subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the 
non-quorate part
of the cluster are down.

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
 wrote:

On 2017-07-24 07:51, Tomer Azran wrote:
> We don't have the ability to use it.
> Is that the only solution?

No, but I'd recommend thinking about it first. Are you sure you will 
care about your cluster working when your server room is on fire? 
'Cause 
unless you have halon suppression, your server room is a complete 
write-off anyway. (Think water from sprinklers hitting rich chunky 
volts 
in the servers.)

Dima

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/> 

-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Openstack Infrastructure

Red Hat

kwenn...@redhat.com   

___
Users mailing list: Users@clusterlabs.org 
<mailto:Users@clusterlabs.org> 
http://lists.clusterlabs.org/mailman/listinfo/users 
<http://lists.clusterlabs.org/mailman/listinfo/users> 

Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/> 
Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
<http://www.clusterlabs.org/

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 05:37 PM, Kristián Feldsam wrote:
> I personally think that power off node by switched pdu is more safe,
> or not?

True if that is working in you environment. If you can't do a physical setup
where you aren't simultaneously loosing connection to both your node and
the switch-device (or you just want to cover cases where that happens)
you have to come up with something else.

>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz 
>
> www.feldhost.cz  - FeldHost™ – profesionální
> hostingové a serverové služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
>
>> On 24 Jul 2017, at 17:27, Klaus Wenninger > > wrote:
>>
>> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>>> I still don't understand why the qdevice concept doesn't help on
>>> this situation. Since the master node is down, I would expect the
>>> quorum to declare it as dead.
>>> Why doesn't it happens?
>>
>> That is not how quorum works. It just limits the decision-making to
>> the quorate subset of the cluster.
>> Still the unknown nodes are not sure to be down.
>> That is why I suggested to have quorum-based watchdog-fencing with sbd.
>> That would assure that within a certain time all nodes of the
>> non-quorate part
>> of the cluster are down.
>>
>>>
>>>
>>>
>>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri
>>> Maziuk" >> > wrote:
>>>
>>> On 2017-07-24 07:51, Tomer Azran wrote:
>>> > We don't have the ability to use it.
>>> > Is that the only solution?
>>>
>>> No, but I'd recommend thinking about it first. Are you sure you will 
>>> care about your cluster working when your server room is on fire? 
>>> 'Cause 
>>> unless you have halon suppression, your server room is a complete 
>>> write-off anyway. (Think water from sprinklers hitting rich chunky 
>>> volts 
>>> in the servers.)
>>>
>>> Dima
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>>
>>>
>>>
>>> ___
>>> Users mailing list: Users@clusterlabs.org
>>> http://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> Project Home: http://www.clusterlabs.org
>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>>> Bugs: http://bugs.clusterlabs.org
>>
>>
>> -- 
>> Klaus Wenninger
>>
>> Senior Software Engineer, EMEA ENG Openstack Infrastructure
>>
>> Red Hat
>>
>> kwenn...@redhat.com   
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org 
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 05:32 PM, Tomer Azran wrote:
> So your suggestion is to use sbd with or without qdevice? What is the
> point of having a qdevice in two node cluster if it doesn't help in
> this situation?

If you have a qdevice setup that is already working (meaning in terms of
that one
of your nodes is quorate and the other not if they are split) I would
use that.
And if you use sbd with just a watchdog (no shared disk) - should be
supported in 
Centos 7.3 (you said you are there somewhere down below iirc) - it would be
assured that the node that is not quorate goes down reliably and that
the other
node is assuming it to be down after a timeout you configured using cluster
property stonith-watchdog-timeout.

>
>
> From: Klaus Wenninger
> Sent: Monday, July 24, 18:28
> Subject: Re: [ClusterLabs] Two nodes cluster issue
> To: Cluster Labs - All topics related to open-source clustering
> welcomed, Tomer Azran
>
>
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>> I still don't understand why the qdevice concept doesn't help on this
>> situation. Since the master node is down, I would expect the quorum
>> to declare it as dead.
>> Why doesn't it happens?
>
> That is not how quorum works. It just limits the decision-making to
> the quorate subset of the cluster.
> Still the unknown nodes are not sure to be down.
> That is why I suggested to have quorum-based watchdog-fencing with sbd.
> That would assure that within a certain time all nodes of the
> non-quorate part
> of the cluster are down.
>
>>
>>
>>
>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk"
>> mailto:dmitri.maz...@gmail.com>> wrote:
>>
>>> On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability
>>> to use it. > Is that the only solution? No, but I'd recommend
>>> thinking about it first. Are you sure you will care about your
>>> cluster working when your server room is on fire? 'Cause unless you
>>> have halon suppression, your server room is a complete write-off
>>> anyway. (Think water from sprinklers hitting rich chunky volts in
>>> the servers.) Dima ___
>>> Users mailing list: Users@clusterlabs.org
>>> <mailto:Users@clusterlabs.org> http://lists.clusterlabs.org/mailman/
>>> <http://lists.clusterlabs.org/mailman/listinfo/users>listinfo
>>> <http://lists.clusterlabs.org/mailman/listinfo/users>/users
>>> <http://lists.clusterlabs.org/mailman/listinfo/users> Project Home:
>>> http://www.clusterlabs.org Getting started:
>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>>> http://bugs.clusterlabs.org
>>
>>
>> ___ Users mailing list:
>> Users@clusterlabs.org <mailto:Users@clusterlabs.org>
>> http://lists.clusterlabs.org/mailman/
>> <http://lists.clusterlabs.org/mailman/listinfo/users>listinfo
>> <http://lists.clusterlabs.org/mailman/listinfo/users>/users
>> <http://lists.clusterlabs.org/mailman/listinfo/users> Project Home:
>> http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs:
>> http://bugs.clusterlabs.org
>
>
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Kristián Feldsam

I personally think that power off node by switched pdu is more safe, or not?

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 17:27, Klaus Wenninger  wrote:
> 
> On 07/24/2017 05:15 PM, Tomer Azran wrote:
>> I still don't understand why the qdevice concept doesn't help on this 
>> situation. Since the master node is down, I would expect the quorum to 
>> declare it as dead.
>> Why doesn't it happens?
> 
> That is not how quorum works. It just limits the decision-making to the 
> quorate subset of the cluster.
> Still the unknown nodes are not sure to be down.
> That is why I suggested to have quorum-based watchdog-fencing with sbd.
> That would assure that within a certain time all nodes of the non-quorate part
> of the cluster are down.
> 
>> 
>> 
>> 
>> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
>> mailto:dmitri.maz...@gmail.com>> wrote:
>> 
>> On 2017-07-24 07:51, Tomer Azran wrote:
>> > We don't have the ability to use it.
>> > Is that the only solution?
>> 
>> No, but I'd recommend thinking about it first. Are you sure you will 
>> care about your cluster working when your server room is on fire? 'Cause 
>> unless you have halon suppression, your server room is a complete 
>> write-off anyway. (Think water from sprinklers hitting rich chunky volts 
>> in the servers.)
>> 
>> Dima
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> 
>> Bugs: http://bugs.clusterlabs.org 
>> 
>> 
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users 
>> 
>> 
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
>> 
>> Bugs: http://bugs.clusterlabs.org 
> 
> -- 
> Klaus Wenninger
> 
> Senior Software Engineer, EMEA ENG Openstack Infrastructure
> 
> Red Hat
> 
> kwenn...@redhat.com    
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> 
> Bugs: http://bugs.clusterlabs.org 

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

So your suggestion is to use sbd with or without qdevice? What is the point of 
having a qdevice in two node cluster if it doesn't help in this situation?


From: Klaus Wenninger
Sent: Monday, July 24, 18:28
Subject: Re: [ClusterLabs] Two nodes cluster issue
To: Cluster Labs - All topics related to open-source clustering welcomed, Tomer 
Azran


On 07/24/2017 05:15 PM, Tomer Azran wrote:
I still don't understand why the qdevice concept doesn't help on this 
situation. Since the master node is down, I would expect the quorum to declare 
it as dead.
Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the quorate 
subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the non-quorate part
of the cluster are down.




On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote: > We don't have the ability to use it. 
> Is that the only solution? No, but I'd recommend thinking about it first. Are 
you sure you will care about your cluster working when your server room is on 
fire? 'Cause unless you have halon suppression, your server room is a complete 
write-off anyway. (Think water from sprinklers hitting rich chunky volts in the 
servers.) Dima ___ Users mailing 
list: Users@clusterlabs.org<mailto:Users@clusterlabs.org> 
http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users>
 Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: 
http://bugs.clusterlabs.org


___ Users mailing list: 
Users@clusterlabs.org<mailto:Users@clusterlabs.org> 
http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users>
 Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: 
http://bugs.clusterlabs.org

-- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure 
Red Hat 
kwenning<mailto:kwenn...@redhat.com>@redhat.com<mailto:kwenn...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 05:15 PM, Tomer Azran wrote:
> I still don't understand why the qdevice concept doesn't help on this
> situation. Since the master node is down, I would expect the quorum to
> declare it as dead.
> Why doesn't it happens?

That is not how quorum works. It just limits the decision-making to the
quorate subset of the cluster.
Still the unknown nodes are not sure to be down.
That is why I suggested to have quorum-based watchdog-fencing with sbd.
That would assure that within a certain time all nodes of the
non-quorate part
of the cluster are down.

>
>
>
> On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk"
> mailto:dmitri.maz...@gmail.com>> wrote:
>
> On 2017-07-24 07:51, Tomer Azran wrote:
> > We don't have the ability to use it.
> > Is that the only solution?
>
> No, but I'd recommend thinking about it first. Are you sure you will 
> care about your cluster working when your server room is on fire? 'Cause 
> unless you have halon suppression, your server room is a complete 
> write-off anyway. (Think water from sprinklers hitting rich chunky volts 
> in the servers.)
>
> Dima
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Openstack Infrastructure

Red Hat

kwenn...@redhat.com   

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

I still don't understand why the qdevice concept doesn't help on this 
situation. Since the master node is down, I would expect the quorum to declare 
it as dead.
Why doesn't it happens?

On Mon, Jul 24, 2017 at 4:15 PM +0300, "Dmitri Maziuk" 
mailto:dmitri.maz...@gmail.com>> wrote:

On 2017-07-24 07:51, Tomer Azran wrote:
> We don't have the ability to use it.
> Is that the only solution?

No, but I'd recommend thinking about it first. Are you sure you will
care about your cluster working when your server room is on fire? 'Cause
unless you have halon suppression, your server room is a complete
write-off anyway. (Think water from sprinklers hitting rich chunky volts
in the servers.)

Dima

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Kristián Feldsam

APC AP7921 is just for 200€ on ebay.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 15:12, Dmitri Maziuk  wrote:
> 
> On 2017-07-24 07:51, Tomer Azran wrote:
>> We don't have the ability to use it.
>> Is that the only solution?
> 
> No, but I'd recommend thinking about it first. Are you sure you will care 
> about your cluster working when your server room is on fire? 'Cause unless 
> you have halon suppression, your server room is a complete write-off anyway. 
> (Think water from sprinklers hitting rich chunky volts in the servers.)
> 
> Dima
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Dmitri Maziuk


On 2017-07-24 07:51, Tomer Azran wrote:

We don't have the ability to use it.
Is that the only solution?


No, but I'd recommend thinking about it first. Are you sure you will 
care about your cluster working when your server room is on fire? 'Cause 
unless you have halon suppression, your server room is a complete 
write-off anyway. (Think water from sprinklers hitting rich chunky volts 
in the servers.)


Dima

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

We don't have the ability to use it.
Is that the only solution?

In addition, it will not cover a scenario that the server room is down (for 
example - fire or earthquake), the switch will go down as well.

From: Klaus Wenninger
Sent: Monday, July 24, 15:31
Subject: Re: [ClusterLabs] Two nodes cluster issue
To: Cluster Labs - All topics related to open-source clustering welcomed, 
Kristián Feldsam

On 07/24/2017 02:05 PM, Kristián Feldsam wrote:
Hello, you have to use second fencing device, for ex. APC Switched PDU.

https://wiki.clusterlabs.org/wiki/Configure_Multiple_Fencing_Devices_Using_pcs

Problem here seems to be that the fencing devices available are running from
the same power-supply as the node itself. So they are kind of useless to 
determine
weather the partner-node has no power or simply is no reachable via network.

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz<mailto:supp...@feldhost.cz>

www.feldhost.cz<http://www.feldhost.cz> - FeldHost™ – profesionální hostingové 
a serverové služby za adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

On 24 Jul 2017, at 13:51, Tomer Azran 
<mailto:tomer.az...@edp.co.il> wrote:

Hello,

We built a pacemaker cluster with 2 physical servers.
We configured DRBD in Master\Slave setup, a floating IP and file system mount 
in Active\Passive mode.
We configured two STONITH devices (fence_ipmilan), one for each server.

We are trying to simulate a situation when the Master server crushes with no 
power.
We pulled both of the PSU cables and the server becomes offline (UNCLEAN).
The resources that the Master use to hold are now in Started (UNCLEAN) state.
The state is unclean since the STONITH failed (the STONITH device is located on 
the server (Intel RMM4 - IPMI) – which uses the same power supply).

The problem is that now, the cluster does not releasing the resources that the 
Master holds, and the service goes down.

Is there any way to overcome this situation?
We tried to add a qdevice but got the same results.

If you have already setup qdevice (using an additional node or so) you could use
quorum-based watchdog-fencing via SBD.

We are using pacemaker 1.1.15 on CentOS 7.3

Thanks,
Tomer.
___
Users mailing list: Users@clusterlabs.org<mailto:Users@clusterlabs.org>
http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users>

Project Home: http://www.clusterlabs.org<http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org<http://bugs.clusterlabs.org/>

___ Users mailing list: 
Users@clusterlabs.org<mailto:Users@clusterlabs.org> 
http://lists.clusterlabs.org/mailman/<http://lists.clusterlabs.org/mailman/listinfo/users>listinfo<http://lists.clusterlabs.org/mailman/listinfo/users>/users<http://lists.clusterlabs.org/mailman/listinfo/users>
 Project Home: http://www.clusterlabs.org Getting started: 
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: 
http://bugs.clusterlabs.org

-- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure 
Red Hat 
kwenning<mailto:kwenn...@redhat.com>@redhat.com<mailto:kwenn...@redhat.com>

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Klaus Wenninger

On 07/24/2017 02:05 PM, Kristián Feldsam wrote:
> Hello, you have to use second fencing device, for ex. APC Switched PDU.
>
> https://wiki.clusterlabs.org/wiki/Configure_Multiple_Fencing_Devices_Using_pcs

Problem here seems to be that the fencing devices available are running from
the same power-supply as the node itself. So they are kind of useless to
determine
weather the partner-node has no power or simply is no reachable via network.
 
>
> S pozdravem Kristián Feldsam
> Tel.: +420 773 303 353, +421 944 137 535
> E-mail.: supp...@feldhost.cz
>
> www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové
> služby za adekvátní ceny.
>
> FELDSAM s.r.o.
> V rohu 434/3
> Praha 4 – Libuš, PSČ 142 00
> IČ: 290 60 958, DIČ: CZ290 60 958
> C 200350 vedená u Městského soudu v Praze
>
> Banka: Fio banka a.s.
> Číslo účtu: 2400330446/2010
> BIC: FIOBCZPPXX
> IBAN: CZ82 2010  0024 0033 0446
>
>> On 24 Jul 2017, at 13:51, Tomer Azran  wrote:
>>
>> Hello,
>>  
>> We built a pacemaker cluster with 2 physical servers.
>> We configured DRBD in Master\Slave setup, a floating IP and file
>> system mount in Active\Passive mode.
>> We configured two STONITH devices (fence_ipmilan), one for each server.
>>  
>> We are trying to simulate a situation when the Master server crushes
>> with no power.
>> We pulled both of the PSU cables and the server becomes offline
>> (UNCLEAN).
>> The resources that the Master use to hold are now in Started
>> (UNCLEAN) state.
>> The state is unclean since the STONITH failed (the STONITH device is
>> located on the server (Intel RMM4 - IPMI) – which uses the same power
>> supply).
>>  
>> The problem is that now, the cluster does not releasing the resources
>> that the Master holds, and the service goes down.
>>  
>> Is there any way to overcome this situation?
>> We tried to add a qdevice but got the same results.

If you have already setup qdevice (using an additional node or so) you
could use
quorum-based watchdog-fencing via SBD.

>>  
>> We are using pacemaker 1.1.15 on CentOS 7.3
>>  
>> Thanks,
>> Tomer.
>> ___
>> Users mailing list: Users@clusterlabs.org 
>> http://lists.clusterlabs.org/mailman/listinfo/users
>>
>> Project Home: http://www.clusterlabs.org 
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org 
>
>
>
> ___
> Users mailing list: Users@clusterlabs.org
> http://lists.clusterlabs.org/mailman/listinfo/users
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


-- 
Klaus Wenninger

Senior Software Engineer, EMEA ENG Openstack Infrastructure

Red Hat

kwenn...@redhat.com   

___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Kristián Feldsam

Hello, you have to use second fencing device, for ex. APC Switched PDU.

https://wiki.clusterlabs.org/wiki/Configure_Multiple_Fencing_Devices_Using_pcs

S pozdravem Kristián Feldsam
Tel.: +420 773 303 353, +421 944 137 535
E-mail.: supp...@feldhost.cz

www.feldhost.cz - FeldHost™ – profesionální hostingové a serverové služby za 
adekvátní ceny.

FELDSAM s.r.o.
V rohu 434/3
Praha 4 – Libuš, PSČ 142 00
IČ: 290 60 958, DIČ: CZ290 60 958
C 200350 vedená u Městského soudu v Praze

Banka: Fio banka a.s.
Číslo účtu: 2400330446/2010
BIC: FIOBCZPPXX
IBAN: CZ82 2010  0024 0033 0446

> On 24 Jul 2017, at 13:51, Tomer Azran  wrote:
> 
> Hello,
>  
> We built a pacemaker cluster with 2 physical servers.
> We configured DRBD in Master\Slave setup, a floating IP and file system mount 
> in Active\Passive mode.
> We configured two STONITH devices (fence_ipmilan), one for each server.
>  
> We are trying to simulate a situation when the Master server crushes with no 
> power.
> We pulled both of the PSU cables and the server becomes offline (UNCLEAN).
> The resources that the Master use to hold are now in Started (UNCLEAN) state.
> The state is unclean since the STONITH failed (the STONITH device is located 
> on the server (Intel RMM4 - IPMI) – which uses the same power supply).
>  
> The problem is that now, the cluster does not releasing the resources that 
> the Master holds, and the service goes down.
>  
> Is there any way to overcome this situation?
> We tried to add a qdevice but got the same results.
>  
> We are using pacemaker 1.1.15 on CentOS 7.3
>  
> Thanks,
> Tomer.
> ___
> Users mailing list: Users@clusterlabs.org 
> http://lists.clusterlabs.org/mailman/listinfo/users 
> 
> 
> Project Home: http://www.clusterlabs.org 
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf 
> 
> Bugs: http://bugs.clusterlabs.org 
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

[ClusterLabs] Two nodes cluster issue

2017-07-24 Thread Tomer Azran

Hello,

We built a pacemaker cluster with 2 physical servers.
We configured DRBD in Master\Slave setup, a floating IP and file system mount 
in Active\Passive mode.
We configured two STONITH devices (fence_ipmilan), one for each server.

We are trying to simulate a situation when the Master server crushes with no 
power.
We pulled both of the PSU cables and the server becomes offline (UNCLEAN).
The resources that the Master use to hold are now in Started (UNCLEAN) state.
The state is unclean since the STONITH failed (the STONITH device is located on 
the server (Intel RMM4 - IPMI) - which uses the same power supply).

The problem is that now, the cluster does not releasing the resources that the 
Master holds, and the service goes down.

Is there any way to overcome this situation?
We tried to add a qdevice but got the same results.

We are using pacemaker 1.1.15 on CentOS 7.3

Thanks,
Tomer.
___
Users mailing list: Users@clusterlabs.org
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

Re: [ClusterLabs] Two nodes cluster issue

[ClusterLabs] Two nodes cluster issue

33 matches

Site Navigation

Mail list logo

Footer information