[
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048949#comment-13048949
]
Henry Robinson commented on ZOOKEEPER-1080:
-------------------------------------------
Hey Eric - this looks good. Protocol looks solid at the first pass. Some
comments, based on a quick look:
* I wouldn't try and delete the root node at STOP time. It seems prone to
problems if you stop one node while others are starting / in a failed state and
don't have ephemerals yet registered. Sequence numbers are a fairly abundant
resource, and if it's possible to run out of them across several runs, it's
definitely possible to run out of them in a single run.
* That tuple support class is, imho, kinda gross. It would be clearer to use
specific struct-type classes whose names correspond to the fields they're
intended to hold.
* 'Observers' is already a meaningful noun in ZK land, so it might be clearer
to call them something else. Paxos uses Learners, but that's also taken inside
ZK. Listeners?
* Not a big deal, but I think you can break out of the for loop at the end of
determineElectionStatus once the offer corresponding to the local node has been
found.
* I think addObserver / removeObserver probably need to synchronize on
observers if you think you need to sync in dispatchEvent as well.
* Is there any way to actually determine who the leader is (if not the local
process)? Seems like this would be useful.
> Provide a Leader Election framework based on Zookeeper receipe
> --------------------------------------------------------------
>
> Key: ZOOKEEPER-1080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
> Project: ZooKeeper
> Issue Type: New Feature
> Components: contrib
> Affects Versions: 3.3.2
> Reporter: Hari A V
> Attachments: LeaderElectionService.pdf, zookeeper-leader-0.0.1.tar.gz
>
>
> Currently Hadoop components such as NameNode and JobTracker are single point
> of failure.
> If Namenode or JobTracker goes down, there service will not be available
> until they are up and running again. If there was a Standby Namenode or
> JobTracker available and ready to serve when Active nodes go down, we could
> have reduced the service down time. Hadoop already provides a Standby
> Namenode implementation which is not fully a "hot" Standby.
> The common problem to be addressed in any such Active-Standby cluster is
> Leader Election and Failure detection. This can be done using Zookeeper as
> mentioned in the Zookeeper recipes.
> http://zookeeper.apache.org/doc/r3.3.3/recipes.html
> +Leader Election Service (LES)+
> Any Node who wants to participate in Leader Election can use this service.
> They should start the service with required configurations. The service will
> notify the nodes whether they should be started as Active or Standby mode.
> Also they intimate any changes in the mode at runtime. All other complexities
> can be handled internally by the LES.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira