Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Ken Gaillot
On 03/16/2016 03:04 PM, Christopher Harvey wrote:
> On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote:
>> On 16/03/16 03:59 PM, Christopher Harvey wrote:
>>> I am able to create a split brain situation in corosync 1.1.13 using
>>> iptables in a 3 node cluster.
>>>
>>> I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5
>>>
>>> All nodes are operational and form a 3 node cluster with all nodes are
>>> members of that ring.
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> so far so good.
>>>
>>> running the following on vmr-132-4 drops all incoming (but not outgoing)
>>> packets from vmr-132-3:
>>> # iptables -I INPUT -s 192.168.132.3 -j DROP
>>> # iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source   destination
>>> DROP   all  --  192.168.132.3anywhere
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>>
>>> vmr-132-3 thinks everything is normal and continues to provide service,
>>> vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
>>> service. Splitting the link between 3 and 4 in both directions isolates
>>> vmr 3 from the rest of the cluster and everything fails over normally,
>>> so only a unidirectional failure causes problems.
>>>
>>> I don't have stonith enabled right now, and looking over the
>>> pacemaker.log file closely to see if 4 and 5 would normally have fenced
>>> 3, but I didn't see any fencing or stonith logs.
>>>
>>> Would stonith solve this problem, or does this look like a bug?
>>
>> It should, that is its job.
> 
> is there some log I can enable that would say
> "ERROR: hey, I would use stonith here, but you have it disabled! your
> warranty is void past this point! do not pass go, do not file a bug"?

Enable fencing, and create a fence device with a static host list that
doesn't match any of your nodes. Pacemaker will think fencing is
configured, but when it tries to actually fence a node, no devices will
be capable of it, and there will be errors to that effect (including "No
such device"). The cluster will block at that point. You can use
stonith_admin --confirm to manually indicate the node is down and
unblock the cluster (but be absolutely sure the node really is down!).

>> -- 
>> Digimer
>> Papers and Projects: https://alteeve.ca/w/
>> What if the cure for cancer is trapped in the mind of a person without
>> access to education?


___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


[ClusterLabs] reproducible split brain

2016-03-19 Thread Christopher Harvey
I am able to create a split brain situation in corosync 1.1.13 using
iptables in a 3 node cluster.

I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5

All nodes are operational and form a 3 node cluster with all nodes are
members of that ring.
vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
so far so good.

running the following on vmr-132-4 drops all incoming (but not outgoing)
packets from vmr-132-3:
# iptables -I INPUT -s 192.168.132.3 -j DROP
# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source   destination
DROP   all  --  192.168.132.3anywhere

Chain FORWARD (policy ACCEPT)
target prot opt source   destination

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination

vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]

vmr-132-3 thinks everything is normal and continues to provide service,
vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
service. Splitting the link between 3 and 4 in both directions isolates
vmr 3 from the rest of the cluster and everything fails over normally,
so only a unidirectional failure causes problems.

I don't have stonith enabled right now, and looking over the
pacemaker.log file closely to see if 4 and 5 would normally have fenced
3, but I didn't see any fencing or stonith logs.

Would stonith solve this problem, or does this look like a bug?

Thanks,
Chris

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] reproducible split brain

2016-03-19 Thread Digimer
On 16/03/16 03:59 PM, Christopher Harvey wrote:
> I am able to create a split brain situation in corosync 1.1.13 using
> iptables in a 3 node cluster.
> 
> I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5
> 
> All nodes are operational and form a 3 node cluster with all nodes are
> members of that ring.
> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> so far so good.
> 
> running the following on vmr-132-4 drops all incoming (but not outgoing)
> packets from vmr-132-3:
> # iptables -I INPUT -s 192.168.132.3 -j DROP
> # iptables -L
> Chain INPUT (policy ACCEPT)
> target prot opt source   destination
> DROP   all  --  192.168.132.3anywhere
> 
> Chain FORWARD (policy ACCEPT)
> target prot opt source   destination
> 
> Chain OUTPUT (policy ACCEPT)
> target prot opt source   destination
> 
> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
> vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]
> 
> vmr-132-3 thinks everything is normal and continues to provide service,
> vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
> service. Splitting the link between 3 and 4 in both directions isolates
> vmr 3 from the rest of the cluster and everything fails over normally,
> so only a unidirectional failure causes problems.
> 
> I don't have stonith enabled right now, and looking over the
> pacemaker.log file closely to see if 4 and 5 would normally have fenced
> 3, but I didn't see any fencing or stonith logs.
> 
> Would stonith solve this problem, or does this look like a bug?

It should, that is its job.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] reproducible split brain

2016-03-18 Thread Christopher Harvey
On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote:
> On 16/03/16 03:59 PM, Christopher Harvey wrote:
> > I am able to create a split brain situation in corosync 1.1.13 using
> > iptables in a 3 node cluster.
> > 
> > I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5
> > 
> > All nodes are operational and form a 3 node cluster with all nodes are
> > members of that ring.
> > vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> > vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> > vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> > so far so good.
> > 
> > running the following on vmr-132-4 drops all incoming (but not outgoing)
> > packets from vmr-132-3:
> > # iptables -I INPUT -s 192.168.132.3 -j DROP
> > # iptables -L
> > Chain INPUT (policy ACCEPT)
> > target prot opt source   destination
> > DROP   all  --  192.168.132.3anywhere
> > 
> > Chain FORWARD (policy ACCEPT)
> > target prot opt source   destination
> > 
> > Chain OUTPUT (policy ACCEPT)
> > target prot opt source   destination
> > 
> > vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
> > vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
> > vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]
> > 
> > vmr-132-3 thinks everything is normal and continues to provide service,
> > vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
> > service. Splitting the link between 3 and 4 in both directions isolates
> > vmr 3 from the rest of the cluster and everything fails over normally,
> > so only a unidirectional failure causes problems.
> > 
> > I don't have stonith enabled right now, and looking over the
> > pacemaker.log file closely to see if 4 and 5 would normally have fenced
> > 3, but I didn't see any fencing or stonith logs.
> > 
> > Would stonith solve this problem, or does this look like a bug?
> 
> It should, that is its job.

is there some log I can enable that would say
"ERROR: hey, I would use stonith here, but you have it disabled! your
warranty is void past this point! do not pass go, do not file a bug"?

> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> 
> ___
> Users mailing list: Users@clusterlabs.org
> http://clusterlabs.org/mailman/listinfo/users
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] reproducible split brain

2016-03-18 Thread Digimer
On 16/03/16 04:04 PM, Christopher Harvey wrote:
> On Wed, Mar 16, 2016, at 04:00 PM, Digimer wrote:
>> On 16/03/16 03:59 PM, Christopher Harvey wrote:
>>> I am able to create a split brain situation in corosync 1.1.13 using
>>> iptables in a 3 node cluster.
>>>
>>> I have 3 nodes, vmr-132-3, vmr-132-4, and vmr-132-5
>>>
>>> All nodes are operational and form a 3 node cluster with all nodes are
>>> members of that ring.
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> so far so good.
>>>
>>> running the following on vmr-132-4 drops all incoming (but not outgoing)
>>> packets from vmr-132-3:
>>> # iptables -I INPUT -s 192.168.132.3 -j DROP
>>> # iptables -L
>>> Chain INPUT (policy ACCEPT)
>>> target prot opt source   destination
>>> DROP   all  --  192.168.132.3anywhere
>>>
>>> Chain FORWARD (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> Chain OUTPUT (policy ACCEPT)
>>> target prot opt source   destination
>>>
>>> vmr-132-3 ---> Online: [ vmr-132-3 vmr-132-4 vmr-132-5 ]
>>> vmr-132-4 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>> vmr-132-5 ---> Online: [ vmr-132-4 vmr-132-5 ]
>>>
>>> vmr-132-3 thinks everything is normal and continues to provide service,
>>> vmr-132-4 and 5 form a new ring, achieve quorum and provide the same
>>> service. Splitting the link between 3 and 4 in both directions isolates
>>> vmr 3 from the rest of the cluster and everything fails over normally,
>>> so only a unidirectional failure causes problems.
>>>
>>> I don't have stonith enabled right now, and looking over the
>>> pacemaker.log file closely to see if 4 and 5 would normally have fenced
>>> 3, but I didn't see any fencing or stonith logs.
>>>
>>> Would stonith solve this problem, or does this look like a bug?
>>
>> It should, that is its job.
> 
> is there some log I can enable that would say
> "ERROR: hey, I would use stonith here, but you have it disabled! your
> warranty is void past this point! do not pass go, do not file a bug"?

If I had it my way, that would be printed to STDOUT when you start
pacemaker without stonith...

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?

___
Users mailing list: Users@clusterlabs.org
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org