[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-30 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027316#comment-13027316
 ] 

Hudson commented on ZOOKEEPER-975:
--

Integrated in ZooKeeper-trunk #1168 (See 
[https://builds.apache.org/hudson/job/ZooKeeper-trunk/1168/])
ZOOKEEPER-975. new peer goes in LEADING state even if ensemble is online. 
(vishal via fpj)


> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5, ZOOKEEPER-975.patch6
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-27 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025877#comment-13025877
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


The problem is not with the patch according to the console output:

{noformat}
[javadoc] javadoc: warning - Error fetching URL: 
http://java.sun.com/javase/6/docs/api/package-list
{noformat}

so it is +1 for me. If no one has anything against it, I'll commit it later.

Thanks, Vishal.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5, ZOOKEEPER-975.patch6
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-25 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024920#comment-13024920
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Does anyone know why we are getting this javadoc warning? ZOOKEEPER-1052 also 
got -1 on javadoc, and I'm not sure what the problem is. Any hint would be 
welcome.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5, ZOOKEEPER-975.patch6
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-25 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024915#comment-13024915
 ] 

Vishal K commented on ZOOKEEPER-975:


the findbugs failure here is due to ZOOKEEPER-1052 as pointed out by Flavio.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5, ZOOKEEPER-975.patch6
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13024913#comment-13024913
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12477322/ZOOKEEPER-975.patch6
  against trunk revision 1091841.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/232//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/232//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/232//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5, ZOOKEEPER-975.patch6
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> rec

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019429#comment-13019429
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476255/ZOOKEEPER-975.patch5
  against trunk revision 1091841.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/226//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/226//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/226//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4, 
> ZOOKEEPER-975.patch5
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire hi

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019118#comment-13019118
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12476190/ZOOKEEPER-975.patch4
  against trunk revision 1091314.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause tar ant target to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/225//testReport/
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/225//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3, ZOOKEEPER-975.patch4
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-07 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017172#comment-13017172
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, What if we have a test with 3 servers with ids 1, 2, and 3. We start 
1 and 2, and let 2 be elected. Once 2 is elected, we start 3. With the current 
trunk code, 3 should declare itself LEADING, since it will receive the initial 
notifications of the other two processes. With your patch, this situation 
shouldn't happen.

What do you think?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-05 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13015975#comment-13015975
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

Do you think we need a test for this? I was looking through the code to see how 
we can write a test. What we can do is insert notifications in recvqueue for a 
peer, then call lookForLeader(), and monitor the 
state/proposdZxid/proposedLeader/ect. This will let us feed whatever 
notifications we want to FLE. The other peers should just ignore the 
notifications (or send notifications that we want them to send). 

However, for this we will have to make changes to FastLeaderElection so that 
one can overload its Messenger, modify recvqueue, set proposedLeader, 
propsedZxid, etc from the test. I think this will be a good change in general 
so that we can feed notifications to a peer and test for corner cases, but a 
bit time consuming. I am not sure how much that will help for this particular 
bug though. What do you think?

-Vishal

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-04-01 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014532#comment-13014532
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, Apart from not having a test, it is +1 for me. Looks good.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-29 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012666#comment-13012666
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, In the run() method of WorkReceiver we perform a check already:

{noformat}
/*
 * If it is from an observer, respond right away.
 * Note that the following predicate assumes that
 * if a server is not a follower, then it must be
 * an observer. If we ever have any other type of
 * learner in the future, we'll have to change the
 * way we check for observers.
 */
if(!self.getVotingView().containsKey(response.sid)){
Vote current = self.getCurrentVote();
ToSend notmsg = new 
ToSend(ToSend.mType.notification,
current.id,
current.zxid,
logicalclock,
self.getPeerState(),
response.sid);

sendqueue.offer(notmsg);
{noformat}

I'll been reviewing your patch shortly.


> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically 

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012598#comment-13012598
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474906/ZOOKEEPER-975.patch3
  against trunk revision 1082362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/209//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/209//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/209//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2, ZOOKEEPER-975.patch3
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This mes

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-28 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012196#comment-13012196
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

I think a peer should check whether the received notification originated from a 
cluster member. This check is done for LOOKING notifications, but after 
modifying proposedLeader, xid and epoch. The verification is not performed for 
leading/following.

I am planning to add this check at the very beginning:

else if(self.getVotingView().containsKey(n.sid)) {
switch (n.state) {
case LOOKING:
 [...]
}

Do you see any issues with this (especially while using OBSERVER_ID)?





> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-28 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012161#comment-13012161
 ] 

Vishal K commented on ZOOKEEPER-975:


Thanks for the clarification. I will reintroduce separate data structure for 
LEADING/FOLLOWING notifications. 

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-26 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011639#comment-13011639
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


The problem I see with mixing notifications from LOOKING peers and 
FOLLOWING/LEADING peers is the following. Peers that are either LEADING or 
FOLLOWING won't change their leader state based on new notifications. If a peer 
receives notifications from a quorum and one of the notifications says 
FOLLOWING/LEADING, then the peer cannot count on the sender of the notification 
to move to a new leader. The election will fail and the peer (perhaps along 
with others) will have to start over. 

Also, in the case a peer gets disconnected and comes back, if there is a quorum 
following a leader when the peer comes back, it needs to be able to determine 
who the leader is even if its leader election round is higher than the others. 
In this case, it has to follow a leader based on FOLLOWING/LEADING 
notifications, without comparing the votes in the notifications.  

For these reasons, we have kept FOLLOWING/LEADING notifications in a separate 
data structure. It is true, though, that we could keep all of them in the same 
structure and simply filter based on the state field when processing.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-25 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011331#comment-13011331
 ] 

Vishal K commented on ZOOKEEPER-975:


correction: "I wasn't sure about that change either"

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-25 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011330#comment-13011330
 ] 

Vishal K commented on ZOOKEEPER-975:


{quote}   1. It does not have a test, but I'm not entirely convinced we should 
try to implement one, since it might be complex. We should think about it, 
though;
{quote}

Yes, I didn't include a test. We should have a test, but I don't think I can 
get to it in the next couple of days or so. I was thinking of feeding the 
recvQueue with dummy Notifications for scenarios that we wanted to test and 
then call FastLeaderElection.lookForLeader() and verify the outcome. The tricky 
part is to come-up with the right sequence of Notifications to test all corner 
cases.

{quote}
   2. I don't understand why you're removing the outofelection data structure
. I believe the notifications from peers that are FOLLOWING/LEADING should be 
treated separately from the the ones of peers that are LOOKING. If a peer 
obtains notifications from a quorum saying that the peers are 
LEADING/FOLLOWING, then it should follow the leader they point to. If a peer 
obtains notifications from a quorum of FOLLOWING peers, then it should follow 
the protocol to select a leader. Consequently, the notifications should be 
treated separately.
{quote}

I was sure about that change either. I don't mind reintroducing outofelection. 
But I couldn't think of scenarios where inserting Notification in recvset 
instead of outofelection will be a problem. Considering that termPredicate 
verifies that majority vote for same , can you describe a 
scenario where this change could cause a problem?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> 

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-25 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011244#comment-13011244
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, Here are two high-level comments:

# It does not have a test, but I'm not entirely convinced we should try to 
implement one, since it might be complex. We should think about it, though;
# I don't understand why you're removing the outofelection data structure. I 
believe the notifications from peers that are FOLLOWING/LEADING should be 
treated separately from the the ones of peers that are LOOKING. If a peer 
obtains notifications from a quorum saying that the peers are 
LEADING/FOLLOWING, then it should follow the leader they point to. If a peer 
obtains notifications from a quorum of FOLLOWING peers, then it should follow 
the protocol to select a leader. Consequently, the notifications should be 
treated separately.   

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010989#comment-13010989
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474564/ZOOKEEPER-975.patch2
  against trunk revision 1082362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/199//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/199//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/199//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch, 
> ZOOKEEPER-975.patch2
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically 

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-24 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010689#comment-13010689
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, I'll have a look at the patch once you generate a new one that 
applies. 

On the last comment you posted, I wonder why you think it would be better to 
update the last zxid according to the state of the server. One problem I see is 
that only the leader can really maintain its own last zxid up to date. The 
other servers don't really know how far the leader has gone. 

Also, the idea of maintaining the last vote intact is to keep the pair used to 
decide upon the current leader, and currently we don't use the zxid field to 
determine leadership if the notification says LEADING or FOLLOWING. Is there 
anything I'm missing?   

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-23 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010522#comment-13010522
 ] 

Vishal K commented on ZOOKEEPER-975:


Note there may be another problem in FLE unreleated to this bug. Originally, in 
QCM instead of sending lastMessage I wanted to send a notification with current 
state. But I noticed that a peer does not send its real current state in 
FastLeaderElection.java even when it is not LOOKING.
Heres the relevant code:
/*
 * If this server is not looking, but the one 
that sent the ack
 * is looking, then send back what it believes 
to be the leader.
 */
Vote current = self.getCurrentVote();
if(ackstate == QuorumPeer.ServerState.LOOKING){
if(LOG.isDebugEnabled()){
LOG.debug("Sending new notification. My 
id =  " +
self.getId() + ", Recipient = " 
+
response.sid + " zxid =" +
current.zxid + " leader=" +
current.id);
}
ToSend notmsg = new ToSend(
ToSend.mType.notification,
current.id,
current.zxid,< zxid of 
current vote
logicalclock,
self.getPeerState(),
response.sid);
sendqueue.offer(notmsg);
}

It sends zxid of the current vote and not the current zxid seen by the server. 
The zxid of the vote can be stale. So I dropped that change from QCM changed 
the size of sendQueue to 1. I think fixing the above code is a separate issue 
and should be done later, if needed.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758  

[jira] [Commented] (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010520#comment-13010520
 ] 

Hadoop QA commented on ZOOKEEPER-975:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12474451/ZOOKEEPER-975.patch
  against trunk revision 1082362.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/195//console

This message is automatically generated.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
>Assignee: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch, ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-08 Thread Michi Mutsuzaki


"Vishal K (JIRA)"  wrote:


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004273#comment-13004273
 ]

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

I have a patch for this, but I have it on the top of the fix for ZOOKEEPER-932. 
We have 932 applied to our ZK code since we need it. Until ZOOKEEPER-932 is 
reviewed and committed, I will have to keep back porting patches (and do double 
testing). I will port my changes to trunk if someone requires a fix for the 
bug. Since this is not a blocker, I am going to hold off the patch until 932 is 
reviewed. That will reduce my testing and porting overhead. Does that sound ok?

The patch I have is good only for FLE.

{quote}
About maintenance, we have some time back talked about maintaining only the TCP 
version of FLE (FLE+QCM). There was never some real pressure to eliminate the 
others, and in fact previously some users were still using LE. I'm all for 
maintaining only FLE, but we need to hear the opinion of others. More thoughts?
{quote}

The documentation says: "The implementations of leader election 1 and 2 are 
currently not supported, and we have the intention of deprecating them in the 
near future. Implementations 0 and 3 are currently supported, and we plan to 
keep supporting them in the near future. To avoid having to support multiple 
versions of leader election unecessarily, we may eventually consider 
deprecating algorithm 0 as well, but we will plan according to the needs of the 
community."

Is there a significant advantage of using LE 0 vs LE 3?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B s

[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-03-08 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004273#comment-13004273
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

I have a patch for this, but I have it on the top of the fix for ZOOKEEPER-932. 
We have 932 applied to our ZK code since we need it. Until ZOOKEEPER-932 is 
reviewed and committed, I will have to keep back porting patches (and do double 
testing). I will port my changes to trunk if someone requires a fix for the 
bug. Since this is not a blocker, I am going to hold off the patch until 932 is 
reviewed. That will reduce my testing and porting overhead. Does that sound ok?

The patch I have is good only for FLE.

{quote}
About maintenance, we have some time back talked about maintaining only the TCP 
version of FLE (FLE+QCM). There was never some real pressure to eliminate the 
others, and in fact previously some users were still using LE. I'm all for 
maintaining only FLE, but we need to hear the opinion of others. More thoughts?
{quote}

The documentation says: "The implementations of leader election 1 and 2 are 
currently not supported, and we have the intention of deprecating them in the 
near future. Implementations 0 and 3 are currently supported, and we plan to 
keep supporting them in the near future. To avoid having to support multiple 
versions of leader election unecessarily, we may eventually consider 
deprecating algorithm 0 as well, but we will plan according to the needs of the 
community."

Is there a significant advantage of using LE 0 vs LE 3?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest 

[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-11 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993551#comment-12993551
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, I'm under the impression that getting the current state only works 
for LEADING or FOLLOWING. While looking, we would need to return the latest 
vote, yes? Overall, it is sound to return the current state instead of keeping 
the latest message.

About maintenance, we have some time back talked about maintaining only the TCP 
version of FLE (FLE+QCM). There was never some real pressure to eliminate the 
others, and in fact previously some users were still using LE. I'm all for 
maintaining only FLE, but we need to hear the opinion of others.

More thoughts?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-11 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993428#comment-12993428
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

Do you agree that sending the current state is a better approach?

I as looking at the code to add a method createNotificationMessage() that will 
return a notification message based on the current state of the server. QCM can 
call this method and send the notification message at the place where we send 
lastMessage. However, adding it to FastLeaderElection only wont be enough since 
there are other leader election algorithms as well. The Election interface is 
very minimal. Should we add this method to this interface? Which leader 
election algorithms do we support and is it ok to make this change only for the 
supported algorithms?

Thanks
-Vishal

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-08 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991939#comment-12991939
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

Re-adding lastMessage results in the delay reported in this bug. When node 2 
attempts to join a cluster of node 1 and 0, nodes 1 and 0 send a "LOOKING" 
notification followed by a "FOLLOWING"/"LEADING" notification. After receiving 
the first pair of LOOKING notifications, 2 goes in the LEADING state.

I think I have a better idea. Instead of sending lastMessage, how about we send 
the notification for the current state of the peer (just like we do in 
FastLeaderElection.java)? I think this will resolve both the problems.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-07 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991486#comment-12991486
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, Sounds like a good way to patch it.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-06 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991288#comment-12991288
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

ok, I will re-add lastMessage. How about before sending lastMessage() we check 
if the send queue is empty? If it is empty, then we send lastMesseage, 
otherwise, we send the message from the queue. This will avoid sending stale 
messages to the peer.

So the final fix will:
1. Change CAPACITY to 1.
2. send lastMessage to A only if the send queue for A is empty. 

This will ensure that we send the most recent notification and we also handle 
lost message as before.

Does this sound ok?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-05 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990955#comment-12990955
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi Vishal, This is a general concurrency issue that arises when using TCP 
connections, and I'm not sure we can avoid it even though our protocol is in 
principle resilient to such message losses. Back to the TCP issue I was 
mentioning, we remove the message from the queue of messages to send, and send 
it with a write call to the socket. However, it is not guaranteed that the 
message will get through, since the destination might drop the connection while 
the message is in transit. The sender has no way to know if it really got 
through, unless we have an acknowledgment scheme on top of TCP, which sounds 
overkill. The way we found to avoid the case in one of the jiras I mentioned 
above was to resend the last message the peer dequeued for a given destination. 
This is exactly the mechanism you're proposing to remove.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-02-03 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990064#comment-12990064
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

Can you describe to me this problem? I will see if the problem still exists if 
this patch is applied after applying patch for ZOOKEEPER-932. Thanks.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-01-27 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987517#action_12987517
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio, 

Do you think that this will be a problem even after we have the patch  for 
ZOOKEEPER-932?

This is what ZOOKEEPER-475 describes:
--
* Replica 1 sends a message to itself and to Replica 2 stating that its current 
vote is for replica 1;
* Replica 2 sends a message to itself and to Replica 1 stating that its current 
vote is for replica 2;
* Replica 1 updates its vote, and sends a message to itself stating that its 
current vote is for replica 2;
* Since replica 1 has two votes for 2 in a an ensemble of 3 replicas, replica 1 
decides to follow 2.

The problem is that replica 2 does not receive a message from 1 stating that it 
changed its vote to 2, which prevents 2 from becoming a leader. Now looking 
more carefully at why that happened, you can see that when 1 tries to send a 
message to 2, QuorumCnxManager in 1 is both shutting down a connection to 2 at 
the same time that it is trying to open a new one. The incorrect 
synchronization prevents the creation of a new connection, and 1 and 2 end up 
not connected.   
--

We  no longer have incorrect synchronization. We can have QCM in 1  shutting 
down the connection to 2 while it is trying to send a notification to 2. 
However, the only time 1 will shutdown a connection to 2 is when it receives a 
new connection request from 2 (or when something is wrong with the connection). 
A new connection request is received when 2 is trying to send a notification to 
1. As a result, 1 will end up sending a notification to 2 saying that it is 
following 2. Do you agree?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote

[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-01-27 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987494#action_12987494
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Hi VIshal, The main problem I see is that the patch you propose removes 
lastMessageSent( ZOOKEEPER-481), which was inserted to deal with a problem 
observed in ZOOKEEPER-475, and also discussed in ZOOKEEPER-480.


> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
> Attachments: ZOOKEEPER-975.patch
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-01-21 Thread Vishal K (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12984722#action_12984722
 ] 

Vishal K commented on ZOOKEEPER-975:


Hi Flavio,

What is the motivation to send the history of notifications to the
joining peer? Shouldn't the most recent notification (or just the
current state) be enough?  I understand this
is a performance issue. However, I think it is a sizeable hole.

- There could have been multiple leader
elections while the node is down and the node could end up hopping
across leaders until it gets to the correct leader.

- Suppose, we have a 3 node cluster. I have a simple client which
connects to A and creates a znode_A to indicate that A (and the
client) is online. The leader A disconnects from B and C and causes C
to take leadership. Now, when A is trying to join the cluster, it can
be unnecessarily delayed due to this bug. If I have an application that
takes some action if znode_A is unavailable, then this bug can
unnecessarily trigger that action. We are facing this problem in our 
application.

I think it will be a small change to QCM. What do you think?

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-975) new peer goes in LEADING state even if ensemble is online

2011-01-18 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983331#action_12983331
 ] 

Flavio Junqueira commented on ZOOKEEPER-975:


Thanks for bringing this up, Vishal. This is not a new observation, although I 
can't remember if we discussed it in a jira or not. In general, I'm lukewarm 
about this change. It is certainly not an issue to avoid the server going into 
LEADING before it goes correctly into LOOKING, but I'm not entirely comfortable 
with manipulating the queues of notifications. Being able to have two servers 
concurrently thinking they are leading is a situation supported by our 
protocols, and such a modification would be an optimization to avoid the 
unnecessary LEADING step. 

Regarding application recovery time, we don't have a load balance scheme at 
this point, which could be quite useful, so bringing a new follower up does not 
guarantee that clients will move their sessions to the new follower. Note that 
this situation occurs only if there is an ensemble running and a server joins 
or recovers.

> new peer goes in LEADING state even if ensemble is online
> -
>
> Key: ZOOKEEPER-975
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-975
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Vishal K
> Fix For: 3.4.0
>
>
> Scenario:
> 1. 2 of the 3 ZK nodes are online
> 2. Third node is attempting to join
> 3. Third node unnecessarily goes in "LEADING" state
> 4. Then third goes back to LOOKING (no majority of followers) and finally 
> goes to FOLLOWING state.
> While going through the logs I noticed that a peer C that is trying to
> join an already formed cluster goes in LEADING state. This is because
> QuorumCnxManager of A and B sends the entire history of notification
> messages to C. C receives the notification messages that were
> exchanged between A and B when they were forming the cluster.
> In FastLeaderElection.lookForLeader(), due to the following piece of
> code, C quits lookForLeader assuming that it is supposed to lead.
> 740 //If have received from all nodes, then 
> terminate
> 741 if ((self.getVotingView().size() == 
> recvset.size()) &&
> 742 
> (self.getQuorumVerifier().getWeight(proposedLeader) != 0)){
> 743 self.setPeerState((proposedLeader == 
> self.getId()) ?
> 744 ServerState.LEADING: 
> learningState());
> 745 leaveInstance();
> 746 return new Vote(proposedLeader, 
> proposedZxid);
> 747
> 748 } else if (termPredicate(recvset,
> This can cause:
> 1.  C to unnecessarily go in LEADING state and wait for tickTime * initLimit 
> and then restart the FLE.
> 2. C waits for 200 ms (finalizeWait) and then considers whatever
> notifications it has received to make a decision. C could potentially
> decide to follow an old leader, fail to connect to the leader, and
> then restart FLE. See code below.
> 752 if (termPredicate(recvset,
> 753 new Vote(proposedLeader, proposedZxid,
> 754 logicalclock))) {
> 755 
> 756 // Verify if there is any change in the 
> proposed leader
> 757 while((n = recvqueue.poll(finalizeWait,
> 758 TimeUnit.MILLISECONDS)) != null){
> 759 if(totalOrderPredicate(n.leader, 
> n.zxid,
> 760 proposedLeader, 
> proposedZxid)){
> 761 recvqueue.put(n);
> 762 break;
> 763 }
> 764 }
> In general, this does not affect correctness of FLE since C will
> eventually go back to FOLLOWING state (A and B won't vote for
> C). However, this delays C from joining the cluster. This can in turn
> affect recovery time of an application.
> Proposal: A and B should send only the latest notification (most
> recent) instead of the entire history. Does this sound reasonable?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.