On 06/29/2016 04:54 AM, Klaus Wenninger wrote: > On 06/29/2016 11:00 AM, Pavlov, Vladimir wrote: >> Thanks a lot. >> We also thought to use Fencing (stonith). >> But production cluster works in the cloud, node1 and node2 is virtual >> machines without any hardware fencing devices. > But there are fence-agents that do fencing via the hypervisor (e.g. > fence_xvm). >> We looked in the direction of the SBR, but its use as far as we understand >> is not justified without shared storage in two-node cluster: >> http://blog.clusterlabs.org/blog/2015/sbd-fun-and-profit > Using SBD with a watchdog (provided your virtual environment provides a > watchdog device inside VMs) for > self-fencing is probably better than no fencing at all.
You can also ask your cloud provider if they provide an API for hard-rebooting instances. If so, there are some fence agents in the wild for common cloud provider APIs, or you could write your own. > Regards, > Klaus >> Are there any ways to do fencing? >> Specifically for our situation, we have found another workaround - use DR >> instead of NAT in IPVS. >> In the case of DR, even if both servers are active at the same time it does >> not matter which of them serve the connection from the client. Web servers >> responds to the client directly. >> This workaround has a right to life? I forget what happens if both ldirectord are up and can't communicate, but it's not that simple. >> Kind regards, >> >> Vladimir Pavlov >> >> Message: 2 >> Date: Tue, 28 Jun 2016 18:53:38 +0300 >> From: "Pavlov, Vladimir" <vladimir.pav...@tns-global.ru> >> To: "'Users@clusterlabs.org'" <Users@clusterlabs.org> >> Subject: [ClusterLabs] Default Behavior >> Message-ID: >> <b38b34ec5621e34dabce13e8b18936e6033f0b17c...@exserv.gallup.tns> >> Content-Type: text/plain; charset="koi8-r" >> >> Hello! >> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), with >> resources IPaddr2 and ldirectord. >> Cluster Properties: >> cluster-infrastructure: cman >> dc-version: 1.1.11-97629de >> no-quorum-policy: ignore >> stonith-enabled: false >> The cluster has been configured for this documentation: >> http://clusterlabs.org/quickstart-redhat-6.html >> Recently, there was a communication failure between cluster nodes and the >> behavior was like this: >> >> - During a network failure, each server has become the Master. >> >> - After the restoration of the network, one node killing services of >> Pacemaker on the second node. >> >> - The second node was not available for the cluster, but all >> resources remain active (Ldirectord,ipvs,ip address). That is, both nodes >> continue to be active. >> We decided to create a test stand and play the situation, but with current >> version of Pacemaker in CentOS repos, ?luster behaves differently: >> >> - During a network failure, each server has become the Master. >> >> - After the restoration of the network, all resources are stopped. >> >> - Then the resources are run only on one node. - This behavior seems >> to be more logical. >> Current Cluster Properties on test stand: >> cluster-infrastructure: cman >> dc-version: 1.1.14-8.el6-70404b0 >> have-watchdog: false >> no-quorum-policy: ignore >> stonith-enabled: false >> Changed the behavior of the cluster in the new version or accident is not >> fully emulated? >> Thank you. >> >> >> Kind regards, >> >> Vladimir Pavlov >> >> -------------- next part -------------- >> An HTML attachment was scrubbed... >> URL: >> <http://clusterlabs.org/pipermail/users/attachments/20160628/b340b971/attachment-0001.html> >> >> ------------------------------ >> >> Message: 3 >> Date: Tue, 28 Jun 2016 12:07:36 -0500 >> From: Ken Gaillot <kgail...@redhat.com> >> To: users@clusterlabs.org >> Subject: Re: [ClusterLabs] Default Behavior >> Message-ID: <5772aed8.6060...@redhat.com> >> Content-Type: text/plain; charset=UTF-8 >> >> On 06/28/2016 10:53 AM, Pavlov, Vladimir wrote: >>> Hello! >>> >>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), >>> with resources IPaddr2 and ldirectord. >>> >>> Cluster Properties: >>> >>> cluster-infrastructure: cman >>> >>> dc-version: 1.1.11-97629de >>> >>> no-quorum-policy: ignore >>> >>> stonith-enabled: false >>> >>> The cluster has been configured for this documentation: >>> http://clusterlabs.org/quickstart-redhat-6.html >>> >>> Recently, there was a communication failure between cluster nodes and >>> the behavior was like this: >>> >>> - During a network failure, each server has become the Master. >>> >>> - After the restoration of the network, one node killing services >>> of Pacemaker on the second node. >>> >>> - The second node was not available for the cluster, but all >>> resources remain active (Ldirectord,ipvs,ip address). That is, both >>> nodes continue to be active. >>> >>> We decided to create a test stand and play the situation, but with >>> current version of Pacemaker in CentOS repos, ?luster behaves differently: >>> >>> - During a network failure, each server has become the Master. >>> >>> - After the restoration of the network, all resources are stopped. >>> >>> - Then the resources are run only on one node. - This behavior >>> seems to be more logical. >>> >>> Current Cluster Properties on test stand: >>> >>> cluster-infrastructure: cman >>> >>> dc-version: 1.1.14-8.el6-70404b0 >>> >>> have-watchdog: false >>> >>> no-quorum-policy: ignore >>> >>> stonith-enabled: false >>> >>> Changed the behavior of the cluster in the new version or accident is >>> not fully emulated? >> If I understand your description correctly, the situation was not >> identical. The difference I see is that, in the original case, the >> second node is not responding to the cluster even after the network is >> restored. Thus, the cluster cannot communicate to carry out the behavior >> observed in the test situation. >> >> Fencing (stonith) is the cluster's only recovery mechanism in such a >> case. When the network splits, or a node becomes unresponsive, it can >> only safely recover resources if it can ensure the other node is powered >> off. Pacemaker supports both physical fencing devices such as an >> intelligent power switch, and hardware watchdog devices for self-fencing >> using sbd. >> >>> Thank you. >>> >>> >>> >>> >>> >>> Kind regards, >>> >>> >>> >>> *Vladimir Pavlov* >> >> >> ------------------------------ >> >> Message: 4 >> Date: Tue, 28 Jun 2016 16:51:50 -0400 >> From: Digimer <li...@alteeve.ca> >> To: Cluster Labs - All topics related to open-source clustering >> welcomed <users@clusterlabs.org> >> Subject: Re: [ClusterLabs] Default Behavior >> Message-ID: <0021409c-86ba-7ef6-875f-0defd3fc9...@alteeve.ca> >> Content-Type: text/plain; charset=UTF-8 >> >> On 28/06/16 11:53 AM, Pavlov, Vladimir wrote: >>> Hello! >>> >>> We have Pacemaker cluster of two node Active/Backup (OS Centos 6.7), >>> with resources IPaddr2 and ldirectord. >>> >>> Cluster Properties: >>> >>> cluster-infrastructure: cman >>> >>> dc-version: 1.1.11-97629de >>> >>> no-quorum-policy: ignore >>> >>> stonith-enabled: false >> You need fencing to be enabled and configured. This is always true, but >> particularly so on RHEL 6 because it uses the cman plugin. Please >> configure and test stonith, and then repeat your tests to see if the >> behavior is more predictable. >> >>> The cluster has been configured for this documentation: >>> http://clusterlabs.org/quickstart-redhat-6.html >>> >>> Recently, there was a communication failure between cluster nodes and >>> the behavior was like this: >>> >>> - During a network failure, each server has become the Master. >>> >>> - After the restoration of the network, one node killing services >>> of Pacemaker on the second node. >>> >>> - The second node was not available for the cluster, but all >>> resources remain active (Ldirectord,ipvs,ip address). That is, both >>> nodes continue to be active. >>> >>> We decided to create a test stand and play the situation, but with >>> current version of Pacemaker in CentOS repos, ?luster behaves differently: >>> >>> - During a network failure, each server has become the Master. >>> >>> - After the restoration of the network, all resources are stopped. >>> >>> - Then the resources are run only on one node. - This behavior >>> seems to be more logical. >>> >>> Current Cluster Properties on test stand: >>> >>> cluster-infrastructure: cman >>> >>> dc-version: 1.1.14-8.el6-70404b0 >>> >>> have-watchdog: false >>> >>> no-quorum-policy: ignore >>> >>> stonith-enabled: false >>> >>> Changed the behavior of the cluster in the new version or accident is >>> not fully emulated? >>> >>> Thank you. >>> >>> >>> >>> >>> >>> Kind regards, >>> >>> >>> >>> *Vladimir Pavlov* _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org