[ 
https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236934#comment-13236934
 ] 

Todd Lipcon commented on HADOOP-8163:
-------------------------------------

Hi Bikas. I think your ideas have some merit, especially with regard to a fully 
general election framework. But since we only have one user of this framework 
at this point (HDFS) and we currently only support a single standby node, I 
would prefer to punt these changes to another JIRA as additional improvements. 
This will let us move forward with the high priority task of auto failover for 
HA NNs, rather than getting distracted making this extremely general.

bq. Secondly, we are performing blocking calls on the ZKClient callback that 
happens on the ZK threads. It is advisable to not block ZK client threads for 
long

This is only the case if you have other operations that are waiting on timely 
delivery of callbacks. In the case of the election framework, all of our 
notifications from ZK have to be received in-order and processed sequentially, 
or else we have a huge explosion of possible interactions to worry about. Doing 
blocking calls in the callbacks will _not_ result in lost ZK leases, etc. To 
quote from the ZK programmer's guide:

"All IO happens on the IO thread (using Java NIO). All event callbacks happen 
on the event thread. Session maintenance such as reconnecting to ZooKeeper 
servers and maintaining heartbeat is done on the IO thread. Responses for 
synchronous methods are also processed in the IO thread. All responses to 
asynchronous methods and watch events are processed on the event thread... 
Callbacks do not block the processing of the IO thread or the processing of the 
synchronous calls"

bq. Thirdly, how about using the setData(breadcrumb, appData, version)?

Let me see about making this change. Like you said, it's a good safety check.
                
> Improve ActiveStandbyElector to provide hooks for fencing old active
> --------------------------------------------------------------------
>
>                 Key: HADOOP-8163
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8163
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: ha
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-8163.txt, hadoop-8163.txt, hadoop-8163.txt, 
> hadoop-8163.txt
>
>
> When a new node becomes active in an HA setup, it may sometimes have to take 
> fencing actions against the node that was formerly active. This JIRA extends 
> the ActiveStandbyElector which adds an extra non-ephemeral node into the ZK 
> directory, which acts as a second copy of the active node's information. 
> Then, if the active loses its ZK session, the next active to be elected may 
> easily locate the unfenced node to take the appropriate actions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to