> On 2011-01-09 06:48:15, fpj wrote: > > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java, > > line 336 > > <https://reviews.apache.org/r/240/diff/1/?file=9407#file9407line336> > > > > If this error is fatal, then I was wondering if we shouldn't abort > > lookForLeader() and perhaps even propagate it up so that we kill the peer. > > What do you think? > > Vishal Kher wrote: > Looking at the code this error case should never happen. But it would be > good to propogate back fatal errors. How do you propose to progate the error > and kill the peer? > > fpj wrote: > One option is to throw an exception from lookForLeader, catch it in the > main loop in QuorumPeer, shutdown the peer in the catch block, and exit the > main thread. The main difficulty is propagating the error to lookForLeader. > The only option I see is propagating it through a special message that FLE > receives through the recvQueue.
I don't know if you have made any progress here, Vishal, but if you haven't perhaps we should consider having a separate jira for it. I've been thinking that this is a more general problem with error handling between QCM and FLE. How does it sound? - fpj ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/240/#review95 ----------------------------------------------------------- On 2011-01-17 03:55:41, Vishal Kher wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/240/ > ----------------------------------------------------------- > > (Updated 2011-01-17 03:55:41) > > > Review request for zookeeper and fpj. > > > Summary > ------- > > QuorumCnxManager performed blocking socket IO at a few places. As a result, > QCM on a peer could block forever which would prevent other peers from > connecting to the blocked peer. > If the peer happens to be the leader, then it will block new peers from > becoming a follower. > > I have made changes as per ZOOKEEPER-932 > > > This addresses bug ZOOKEEPER-932. > https://issues.apache.org/jira/browse/ZOOKEEPER-932 > > > Diffs > ----- > > > trunk/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java > 1040328 > trunk/src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java > 1040328 > > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java > 1040328 > trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java > 1040328 > trunk/src/java/test/org/apache/zookeeper/test/CnxManagerTest.java 1040328 > > Diff: https://reviews.apache.org/r/240/diff > > > Testing > ------- > > - ant test-core-java > - systest > - basic hand testing > - rebooted follower/leader several times > > > Thanks, > > Vishal > >
