Hi,

Thanks for your response.

I'll try booth plugin. It's in debian repository.

If it don't work I'll try making a fake fencing agent that will only restart 
corosync. ( It seems it fix the problem )


Thanks
________________________________________
De : linux-ha-boun...@lists.linux-ha.org [linux-ha-boun...@lists.linux-ha.org] 
de la part de Digimer [li...@alteeve.ca]
Envoyé : jeudi 27 février 2014 17:05
À : General Linux-HA mailing list
Objet : Re: [Linux-HA] 2 Nodes split brain, distant sites

On 27/02/14 09:42 AM, TRIBOLET Thomas wrote:
> 2)      My problem :
>
> When there is a network problem :
>
> Ex :
> a) first-node site lost internet connection ( and communication with 
> second-node at same time due to vpn on internet connection )
> b) cluster stop openvpn on first node and launch it on second due to 
> primitive p_ping in config.
> c) connection come back on first-node site
> d) Problem : first-node and second-node don't bring back cluster, the don't 
> see each other and create a cluster on each node -> split brain I think.
> e) Each node has openvpn running which shouldn't happen
>
>
> I don't have stonith running because I think without quorum it will be 
> problematic
> Is there a way to say to corosync to recreate a ring ?
>
> Or have someone another solution ?
>
> Thanks

Bonjour,

   This is the fundamental problem of "stretch" clusters (or
geo-clusters). There is no way to tell the difference between a site
failure and a network failure. In either case, the link is down, so
fencing can't be used. Without fencing, there is no way to avoid a
split-brain.

   As for quorum; When quorum isn't used, fencing becomes *more*
important. Even then, quorum and fencing solve different problems.
Quorum is useful when nodes are acting in a defined manner. Fencing is
needed when a node is in an unknown state (and thus acting in an
undefined manner).

   So regardless of quorum, fencing is required. It is the only way to
reliably avoid split-brains. Unfortunately, fencing doesn't work on
stretch clusters.

   The pacemaker project is working on something called "booth" which is
designed to deal with this problem, but I don't know much about it, or
whether it's out of testing/dev yet.

   So in short, if you must have a stretch cluster, I recommend manual
failover only.

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to