Hi, Thanks for your response.
I'll try booth plugin. It's in debian repository. If it don't work I'll try making a fake fencing agent that will only restart corosync. ( It seems it fix the problem ) Thanks ________________________________________ De : linux-ha-boun...@lists.linux-ha.org [linux-ha-boun...@lists.linux-ha.org] de la part de Digimer [li...@alteeve.ca] Envoyé : jeudi 27 février 2014 17:05 À : General Linux-HA mailing list Objet : Re: [Linux-HA] 2 Nodes split brain, distant sites On 27/02/14 09:42 AM, TRIBOLET Thomas wrote: > 2) My problem : > > When there is a network problem : > > Ex : > a) first-node site lost internet connection ( and communication with > second-node at same time due to vpn on internet connection ) > b) cluster stop openvpn on first node and launch it on second due to > primitive p_ping in config. > c) connection come back on first-node site > d) Problem : first-node and second-node don't bring back cluster, the don't > see each other and create a cluster on each node -> split brain I think. > e) Each node has openvpn running which shouldn't happen > > > I don't have stonith running because I think without quorum it will be > problematic > Is there a way to say to corosync to recreate a ring ? > > Or have someone another solution ? > > Thanks Bonjour, This is the fundamental problem of "stretch" clusters (or geo-clusters). There is no way to tell the difference between a site failure and a network failure. In either case, the link is down, so fencing can't be used. Without fencing, there is no way to avoid a split-brain. As for quorum; When quorum isn't used, fencing becomes *more* important. Even then, quorum and fencing solve different problems. Quorum is useful when nodes are acting in a defined manner. Fencing is needed when a node is in an unknown state (and thus acting in an undefined manner). So regardless of quorum, fencing is required. It is the only way to reliably avoid split-brains. Unfortunately, fencing doesn't work on stretch clusters. The pacemaker project is working on something called "booth" which is designed to deal with this problem, but I don't know much about it, or whether it's out of testing/dev yet. So in short, if you must have a stretch cluster, I recommend manual failover only. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems