Re: [ClusterLabs] Default Behavior

Ken Gaillot Wed, 29 Jun 2016 07:30:21 -0700

On 06/29/2016 04:54 AM, Klaus Wenninger wrote:
> On 06/29/2016 11:00 AM, Pavlov, Vladimir wrote:
>> Thanks a lot.
>> We also thought to use Fencing (stonith).
>> But production cluster works in the cloud, node1 and node2 is virtual 
>> machines without any hardware fencing devices.
> But there are fence-agents that do fencing via the hypervisor (e.g.
> fence_xvm).
>> We looked in the direction of the SBR, but its use as far as we understand 
>> is not justified without shared storage in two-node cluster:
>> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit
> Using SBD with a watchdog (provided your virtual environment provides a
> watchdog device inside VMs) for
> self-fencing is probably better than no fencing at all.


You can also ask your cloud provider if they provide an API for
hard-rebooting instances. If so, there are some fence agents in the wild
for common cloud provider APIs, or you could write your own.

> Regards,
> Klaus
>> Are there any ways to do fencing?
>> Specifically for our situation, we have found another workaround - use DR 
>> instead of NAT in IPVS.
>> In the case of DR, even if both servers are active at the same time it does 
>> not matter which of them serve the connection from the client. Web servers 
>> responds to the client directly.
>> This workaround has a right to life?

I forget what happens if both ldirectord are up and can't communicate,
but it's not that simple.

>> Kind regards,
>>  
>> Vladimir Pavlov
>>
>> Message: 2
>> Date: Tue, 28 Jun 2016 18:53:38 +0300
>> From: "Pavlov, Vladimir" <vladimir.pav...@tns-global.ru>
>> To: "'Users@clusterlabs.org'" <Users@clusterlabs.org>
>> Subject: [ClusterLabs] Default Behavior
>> Message-ID:
>>      <b38b34ec5621e34dabce13e8b18936e6033f0b17c...@exserv.gallup.tns>
>> Content-Type: text/plain; charset="koi8-r"
>>
>> Hello!
>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), with 
>> resources IPaddr2 and ldirectord.
>> Cluster Properties:
>> cluster-infrastructure: cman
>> dc-version: 1.1.11-97629de
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> The cluster has been configured for this documentation: 
>> http://clusterlabs.org/quickstart-redhat-6.html
>> Recently, there was a communication failure between cluster nodes and the 
>> behavior was like this:
>>
>> -        During a network failure, each server has become the Master.
>>
>> -        After the restoration of the network, one node killing services of 
>> Pacemaker on the second node.
>>
>> -        The second node was not available for the cluster, but all 
>> resources remain active (Ldirectord,ipvs,ip address). That is, both nodes 
>> continue to be active.
>> We decided to create a test stand and play the situation, but with current 
>> version of Pacemaker in CentOS repos, ?luster behaves differently:
>>
>> -        During a network failure, each server has become the Master.
>>
>> -        After the restoration of the network, all resources are stopped.
>>
>> -        Then the resources are run only on one node. - This behavior seems 
>> to be more logical.
>> Current Cluster Properties on test stand:
>> cluster-infrastructure: cman
>> dc-version: 1.1.14-8.el6-70404b0
>> have-watchdog: false
>> no-quorum-policy: ignore
>> stonith-enabled: false
>> Changed the behavior of the cluster in the new version or accident is not 
>> fully emulated?
>> Thank you.
>>
>>
>> Kind regards,
>>
>> Vladimir Pavlov
>>
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: 
>> <http://clusterlabs.org/pipermail/users/attachments/20160628/b340b971/attachment-0001.html>
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Tue, 28 Jun 2016 12:07:36 -0500
>> From: Ken Gaillot <kgail...@redhat.com>
>> To: users@clusterlabs.org
>> Subject: Re: [ClusterLabs] Default Behavior
>> Message-ID: <5772aed8.6060...@redhat.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote:
>>> Hello!
>>>
>>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
>>> with resources IPaddr2 and ldirectord.
>>>
>>> Cluster Properties:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.11-97629de
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> The cluster has been configured for this documentation:
>>> http://clusterlabs.org/quickstart-redhat-6.html
>>>
>>> Recently, there was a communication failure between cluster nodes and
>>> the behavior was like this:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, one node killing services
>>> of Pacemaker on the second node.
>>>
>>> -        The second node was not available for the cluster, but all
>>> resources remain active (Ldirectord,ipvs,ip address). That is, both
>>> nodes continue to be active.
>>>
>>> We decided to create a test stand and play the situation, but with
>>> current version of Pacemaker in CentOS repos, ?luster behaves differently:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, all resources are stopped.
>>>
>>> -        Then the resources are run only on one node. - This behavior
>>> seems to be more logical.
>>>
>>> Current Cluster Properties on test stand:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.14-8.el6-70404b0
>>>
>>> have-watchdog: false
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> Changed the behavior of the cluster in the new version or accident is
>>> not fully emulated?
>> If I understand your description correctly, the situation was not
>> identical. The difference I see is that, in the original case, the
>> second node is not responding to the cluster even after the network is
>> restored. Thus, the cluster cannot communicate to carry out the behavior
>> observed in the test situation.
>>
>> Fencing (stonith) is the cluster's only recovery mechanism in such a
>> case. When the network splits, or a node becomes unresponsive, it can
>> only safely recover resources if it can ensure the other node is powered
>> off. Pacemaker supports both physical fencing devices such as an
>> intelligent power switch, and hardware watchdog devices for self-fencing
>> using sbd.
>>
>>> Thank you.
>>>
>>>  
>>>
>>>  
>>>
>>> Kind regards,
>>>
>>>  
>>>
>>> *Vladimir Pavlov*
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Tue, 28 Jun 2016 16:51:50 -0400
>> From: Digimer <li...@alteeve.ca>
>> To: Cluster Labs - All topics related to open-source clustering
>>      welcomed        <users@clusterlabs.org>
>> Subject: Re: [ClusterLabs] Default Behavior
>> Message-ID: <0021409c-86ba-7ef6-875f-0defd3fc9...@alteeve.ca>
>> Content-Type: text/plain; charset=UTF-8
>>
>> On 28/06/16 11:53 AM, Pavlov, Vladimir wrote:
>>> Hello!
>>>
>>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7),
>>> with resources IPaddr2 and ldirectord.
>>>
>>> Cluster Properties:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.11-97629de
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>> You need fencing to be enabled and configured. This is always true, but
>> particularly so on RHEL 6 because it uses the cman plugin. Please
>> configure and test stonith, and then repeat your tests to see if the
>> behavior is more predictable.
>>
>>> The cluster has been configured for this documentation:
>>> http://clusterlabs.org/quickstart-redhat-6.html
>>>
>>> Recently, there was a communication failure between cluster nodes and
>>> the behavior was like this:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, one node killing services
>>> of Pacemaker on the second node.
>>>
>>> -        The second node was not available for the cluster, but all
>>> resources remain active (Ldirectord,ipvs,ip address). That is, both
>>> nodes continue to be active.
>>>
>>> We decided to create a test stand and play the situation, but with
>>> current version of Pacemaker in CentOS repos, ?luster behaves differently:
>>>
>>> -        During a network failure, each server has become the Master.
>>>
>>> -        After the restoration of the network, all resources are stopped.
>>>
>>> -        Then the resources are run only on one node. - This behavior
>>> seems to be more logical.
>>>
>>> Current Cluster Properties on test stand:
>>>
>>> cluster-infrastructure: cman
>>>
>>> dc-version: 1.1.14-8.el6-70404b0
>>>
>>> have-watchdog: false
>>>
>>> no-quorum-policy: ignore
>>>
>>> stonith-enabled: false
>>>
>>> Changed the behavior of the cluster in the new version or accident is
>>> not fully emulated?
>>>
>>> Thank you.
>>>
>>>  
>>>
>>>  
>>>
>>> Kind regards,
>>>
>>>  
>>>
>>> *Vladimir Pavlov*

_______________________________________________
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Default Behavior

Reply via email to