Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Andrew Beekhof
On Mon, Oct 10, 2016 at 8:04 AM, Digimer wrote: > > The only geo-located/stretch cluster approach that I've seen that makes > any sense and seems genuinely safe is SUSE's 'pacemaker booth' project. Also arriving in RHEL 7.3 Might be tech preview though.

[ClusterLabs] Easy Linux Bonding Question?

2016-10-10 Thread Eric Robinson
Short version: How many missed arp_intervals does the bonding driver wait before removing the PASSIVE slave from the bond? Long version: I'm confused about this because I know the passive slave watches for the active slave's arp broadcast as a way of knowing that the passive slave's link is

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
Thanks for all suggestions. It is really odd for me that this usecase, which is very basic for simple virtualization cluster, is not described in every FAQ out there... It appears that my setup is working correctly with non-symmetrical ordering constraints: Ordering Constraints: start

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 08:35 PM, Eric Robinson wrote: > Basically, when we turn off a switch, I want to keep the cluster from failing > over before Linux bonding has had a chance to recover. > > I'm mostly interested in prventing false-positive cluster failovers that > might occur during manual network

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
Basically, when we turn off a switch, I want to keep the cluster from failing over before Linux bonding has had a chance to recover. I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Eric Robinson
I'm mostly interested in prventing false-positive cluster failovers that might occur during manual network maintenance (for example, testing switch and link outages). >> Thanks for the clarification. So what's the easiest way to ensure that the >> cluster waits a >> desired timeout before

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 06:58 PM, Eric Robinson wrote: > Thanks for the clarification. So what's the easiest way to ensure that the > cluster waits a desired timeout before deciding that a re-convergence is > necessary? By raising the token (lost) timeout I would say. Please correct my (Chrissie) but I

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 06:56 PM, Ken Gaillot wrote: > On 10/10/2016 10:21 AM, Klaus Wenninger wrote: >> On 10/10/2016 04:54 PM, Ken Gaillot wrote: >>> On 10/10/2016 07:36 AM, Pavel Levshin wrote: 10.10.2016 15:11, Klaus Wenninger: > On 10/10/2016 02:00 PM, Pavel Levshin wrote: >> 10.10.2016

Re: [ClusterLabs] Pacemaker and Oracle ASM

2016-10-10 Thread Chad Cravens
The client has specifically said that they purposefully chosen not to implement Oracle RAC (I'm not sure why). It's a large program and the budget and technology platforms were already put in place before I was involved. In other words, I'm not sure why, not sure there's a reason, but RedHat HA

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Ken Gaillot
On 10/10/2016 10:21 AM, Klaus Wenninger wrote: > On 10/10/2016 04:54 PM, Ken Gaillot wrote: >> On 10/10/2016 07:36 AM, Pavel Levshin wrote: >>> 10.10.2016 15:11, Klaus Wenninger: On 10/10/2016 02:00 PM, Pavel Levshin wrote: > 10.10.2016 14:32, Klaus Wenninger: >> Why are the

Re: [ClusterLabs] Pacemaker and Oracle ASM

2016-10-10 Thread emmanuel segura
why you don't use oracle rac with asm? 2016-10-07 18:46 GMT+02:00 Chad Cravens : > Hello: > > I'm working on a project where the client is using Oracle ASM (volume > manager) for database storage. I have implemented a cluster before using > LVM with ext4 and understand

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Ken Gaillot
On 10/10/2016 07:36 AM, Pavel Levshin wrote: > 10.10.2016 15:11, Klaus Wenninger: >> On 10/10/2016 02:00 PM, Pavel Levshin wrote: >>> 10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? >>> If they were mandatory, then all the virtual machines

[ClusterLabs] OCFS2 on cLVM with node waiting for fencing timeout

2016-10-10 Thread Ulrich Windl
Hi! I observed an interesting thing: In a three node cluster (SLES11 SP4) with cLVM and OCFS2 on top, one node was fenced as the OCFS2 filesystem was somehow busy on unmount. We have (for paranoid reasons mainly) an excessive long fencing timout for SBD: 180 seconds While one node was

Re: [ClusterLabs] I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 04:25 PM, Ken Gaillot wrote: > On 10/09/2016 11:02 PM, Digimer wrote: >> On 09/10/16 11:58 PM, Andrei Borzenkov wrote: >>> 10.10.2016 00:42, Eric Robinson пишет: Digimer, thanks for your thoughts. Booth is one of the solutions I looked at, but I don't like it because it is

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
10.10.2016 15:11, Klaus Wenninger: On 10/10/2016 02:00 PM, Pavel Levshin wrote: 10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? If they were mandatory, then all the virtual machines would be restarted when libvirtd restarts. This is not desired

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 02:00 PM, Pavel Levshin wrote: > > 10.10.2016 14:32, Klaus Wenninger: >> Why are the order-constraints between libvirt & vms optional? > > If they were mandatory, then all the virtual machines would be > restarted when libvirtd restarts. This is not desired nor needed. When > this

Re: [ClusterLabs] Antw: Re: Antw: Re: Antw: Re: When the DC crmd is frozen, cluster decisions are delayed infinitely

2016-10-10 Thread Klaus Wenninger
On 10/07/2016 11:10 PM, renayama19661...@ybb.ne.jp wrote: > Hi All, > > Our user may not necessarily use sdb. > > I confirmed that there was a method using WD service of corosync as one > method not to use sdb. > Pacemaker watches the process of pacemaker by WD service using CMAP and can > carry

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Pavel Levshin
10.10.2016 14:32, Klaus Wenninger: Why are the order-constraints between libvirt & vms optional? If they were mandatory, then all the virtual machines would be restarted when libvirtd restarts. This is not desired nor needed. When this happens, the node is fenced because it is unable to

Re: [ClusterLabs] Establishing Timeouts

2016-10-10 Thread Christine Caulfield
On 10/10/16 05:51, Eric Robinson wrote: > I have about a dozen corosync+pacemaker clusters and I am just now getting > around to understanding timeouts. > > Most of my corosync.conf files look something like this: > > version:2 > token: 5000 >

Re: [ClusterLabs] Colocation and ordering with live migration

2016-10-10 Thread Klaus Wenninger
On 10/10/2016 10:17 AM, Pavel Levshin wrote: > Hello. > > We are trying to migrate our services to relatively fresh version of > cluster software. It is RHEL 7 with pacemaker 1.1.13-10. I’ve faced a > problem when live migration of virtual machines is allowed. In short, > I need to manage

Re: [ClusterLabs] Antw: Establishing Timeouts

2016-10-10 Thread Eric Robinson
> AFAIK, it _all_ ARP targets did not respond _once_ the link will be > considered down It would be great if someone could confirm that. > after "Down Delay". I guess you want to use multiple (and the correct ones) > ARP IP targets... Yes, I use multiple targets, and arp_all_targets=any.

[ClusterLabs] Antw: I've been working on a split-brain prevention strategy for 2-node clusters.

2016-10-10 Thread Ulrich Windl
>>> Eric Robinson schrieb am 09.10.2016 um 22:33 in Nachricht > I've been working on a script for preventing split-brain in 2-node clusters > and > I would appreciate comments from

[ClusterLabs] Antw: 2-Node Cluster, 2 Corosync Rings, Why Failover?

2016-10-10 Thread Ulrich Windl
>>> Eric Robinson schrieb am 09.10.2016 um 05:25 in Nachricht > In a 2-node cluster where each node has two NICs connected to disjoint > networks, and thus 2 corosync rings, why would loss