[ https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048949#comment-13048949 ]
Henry Robinson commented on ZOOKEEPER-1080: ------------------------------------------- Hey Eric - this looks good. Protocol looks solid at the first pass. Some comments, based on a quick look: * I wouldn't try and delete the root node at STOP time. It seems prone to problems if you stop one node while others are starting / in a failed state and don't have ephemerals yet registered. Sequence numbers are a fairly abundant resource, and if it's possible to run out of them across several runs, it's definitely possible to run out of them in a single run. * That tuple support class is, imho, kinda gross. It would be clearer to use specific struct-type classes whose names correspond to the fields they're intended to hold. * 'Observers' is already a meaningful noun in ZK land, so it might be clearer to call them something else. Paxos uses Learners, but that's also taken inside ZK. Listeners? * Not a big deal, but I think you can break out of the for loop at the end of determineElectionStatus once the offer corresponding to the local node has been found. * I think addObserver / removeObserver probably need to synchronize on observers if you think you need to sync in dispatchEvent as well. * Is there any way to actually determine who the leader is (if not the local process)? Seems like this would be useful. > Provide a Leader Election framework based on Zookeeper receipe > -------------------------------------------------------------- > > Key: ZOOKEEPER-1080 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080 > Project: ZooKeeper > Issue Type: New Feature > Components: contrib > Affects Versions: 3.3.2 > Reporter: Hari A V > Attachments: LeaderElectionService.pdf, zookeeper-leader-0.0.1.tar.gz > > > Currently Hadoop components such as NameNode and JobTracker are single point > of failure. > If Namenode or JobTracker goes down, there service will not be available > until they are up and running again. If there was a Standby Namenode or > JobTracker available and ready to serve when Active nodes go down, we could > have reduced the service down time. Hadoop already provides a Standby > Namenode implementation which is not fully a "hot" Standby. > The common problem to be addressed in any such Active-Standby cluster is > Leader Election and Failure detection. This can be done using Zookeeper as > mentioned in the Zookeeper recipes. > http://zookeeper.apache.org/doc/r3.3.3/recipes.html > +Leader Election Service (LES)+ > Any Node who wants to participate in Leader Election can use this service. > They should start the service with required configurations. The service will > notify the nodes whether they should be started as Active or Standby mode. > Also they intimate any changes in the mode at runtime. All other complexities > can be handled internally by the LES. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira