[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049136#comment-13049136
 ] 

Hari A V commented on ZOOKEEPER-1080:
-------------------------------------

Hi,

I had quickily gone through the patch given

I have following doubts

1. How about handling of "Disconnected" and "Expired" events from Zookeeper?

Consider a scenario where 
  Active and Standby node is running
  Active node`s network failed. It is "Disconnected" from Zookeeper. But it 
will still behave as Active as it is not getting any events from the framework. 
  Meanwhile Standby gets the "Node Deleted" event and becomes Active

2. When event is dispatched to the Observer, say "elected as leader", observer 
may want to start its process as leader. It may not be able to start it due to 
some reason and throws Exception. What happens next? Whether this error is 
propogated and given other node a chance to become leader ?

3.  dispatchEvent(EventType.READY_START);
    logger.info("{} not elected leader. Watching node:{}", leaderOffer._2,
        neighborLeaderOffer._2);
    /*
     * Make sure to pass an explicit Watcher because we could be sharing this
     * zooKeeper instance with someone else.
     */
    Stat stat = zooKeeper.exists(neighborLeaderOffer._2, this);
    
    Here, we first dispatch the event and then adding watch on the leader. If 
the dispatchEvent failed (may be starting as Standby failed), then how do we 
continue the state of the node?
  
Points 2 and 3 can be handled by writing wrappers around this, but leaves those 
complexities to upper layer. All above are the scenarios i got when i 
integrated similar framework to Active-Standby NameNode cluster ( The complete 
solution is currently tested internally and undergoing under beta testing).

Please find another approach (which i have integrated with NN and JobTracker). 
Currently i added the patch as attachment. I can even submit it once u guys 
give your opinion

thanks
Hari



> Provide a Leader Election framework based on Zookeeper receipe
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1080
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: contrib
>    Affects Versions: 3.3.2
>            Reporter: Hari A V
>         Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, 
> zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz
>
>
> Currently Hadoop components such as NameNode and JobTracker are single point 
> of failure.
> If Namenode or JobTracker goes down, there service will not be available 
> until they are up and running again. If there was a Standby Namenode or 
> JobTracker available and ready to serve when Active nodes go down, we could 
> have reduced the service down time. Hadoop already provides a Standby 
> Namenode implementation which is not fully a "hot" Standby. 
> The common problem to be addressed in any such Active-Standby cluster is 
> Leader Election and Failure detection. This can be done using Zookeeper as 
> mentioned in the Zookeeper recipes.
> http://zookeeper.apache.org/doc/r3.3.3/recipes.html
> +Leader Election Service (LES)+
> Any Node who wants to participate in Leader Election can use this service. 
> They should start the service with required configurations. The service will 
> notify the nodes whether they should be started as Active or Standby mode. 
> Also they intimate any changes in the mode at runtime. All other complexities 
> can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to