Forwarding it to the dev list.

Thanks
mahadev


On 1/11/11 11:34 PM, "Vishal Kher" <[email protected]> wrote:

> Hi,
> 
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally
> goes to FOLLOWING state.
> 
> While going through the logs I noticed that a peer C that is trying to join
> an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C.
> C receives the notification messages that were exchanged between A and B
> when they were forming the cluster.
> 
> In FastLeaderElection.lookForLeader(), due to the following piece of code, C
> quits lookForLeader assuming that it is supposed to lead.
> 
> 740                             //If have received from all nodes, then
> terminate
> 741                             if ((self.getVotingView().size() ==
> recvset.size()) &&
> 742
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743                                 self.setPeerState((proposedLeader ==
> self.getId()) ?
> 744                                         ServerState.LEADING:
> learningState());
> 745                                 leaveInstance();
> 746                                 return new Vote(proposedLeader,
> proposedZxid);
> 747
> 748                             } else if (termPredicate(recvset,
> 
> 
> In general, this does not affect correctness of FLE since C will eventually
> go back to FOLLOWING state (A and B won't vote for C). However, this delays
> C from joining the cluster. This can in turn affect recovery time of an
> application.
> 
> I think A and B should send only the latest notification (most recent)
> instead of the entire history. Does this sound resonable?
> 
> Thanks.
> -Vishal
> 

Reply via email to