Hi Alex, Thanks for the explanation.
Then I have another question: If there are 7 machines in my current zookeeper clusters, two of them are failed. How can I reconfigure the Zookeeper to make it working with 5 machines? i.e if the master can get 3 machines' reply, it can commit the transaction. On the other hand, if I add 2 machines to make a 9 node Zookeeper cluster, how can I configure it to make it taking advantages of 9 machines? This is more related to user mailing list. So I cc to it. Thanks, Peter On Tue, Aug 30, 2011 at 12:21 PM, Alexander Shraer <shra...@yahoo-inc.com>wrote: > Hi Peter, > > It's the second option. The servers don't know if the leader failed or > was partitioned from them. So each group of 3 servers in your scenario > can't distinguish the situation from another scenario where none of the > servers > failed but these 3 servers are partitioned from the other 4. To prevent a > split brain > in an asynchronous network a leader must have the support of a quorum. > > Alex > > > -----Original Message----- > > From: cheetah [mailto:xuw...@gmail.com] > > Sent: Tuesday, August 30, 2011 12:23 AM > > To: dev@zookeeper.apache.org > > Subject: How zab avoid split-brain problem? > > > > Hi folks, > > I am reading the zab paper, but a bit confusing how zab handle > > split > > brain problem. > > Suppose there are A, B, C, D, E, F and G seven servers, now A is > > the > > leader. When A dies and at the same time, B,C,D are isolated from E, F > > and > > G. > > In this case, will Zab continue working like this: if B>C>D and > > E>F>G, > > so the two groups are both voting and electing B and E as their leaders > > separately. Thus, there is a split brain problem. > > Or Zookeeper just stop working, because there were original 7 > > servers, > > after 1 failure, a new leader still expects to have a quorum of 3 > > servers > > voting for it as the leader. And because the two groups are separate > > from > > each other, no leader can be elected out. > > > > If it is the first case, Zookeeper will have a split brain > > problem, > > which probably is not the case. But in the second case, a 7-node > > Zookeeper > > service can only handle a node failure and a network partition failure. > > > > Am I understanding wrongly? Looking forward to your insights. > > > > Thanks, > > Peter >