[
https://issues.apache.org/jira/browse/HADOOP-8163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13236934#comment-13236934
]
Todd Lipcon commented on HADOOP-8163:
-------------------------------------
Hi Bikas. I think your ideas have some merit, especially with regard to a fully
general election framework. But since we only have one user of this framework
at this point (HDFS) and we currently only support a single standby node, I
would prefer to punt these changes to another JIRA as additional improvements.
This will let us move forward with the high priority task of auto failover for
HA NNs, rather than getting distracted making this extremely general.
bq. Secondly, we are performing blocking calls on the ZKClient callback that
happens on the ZK threads. It is advisable to not block ZK client threads for
long
This is only the case if you have other operations that are waiting on timely
delivery of callbacks. In the case of the election framework, all of our
notifications from ZK have to be received in-order and processed sequentially,
or else we have a huge explosion of possible interactions to worry about. Doing
blocking calls in the callbacks will _not_ result in lost ZK leases, etc. To
quote from the ZK programmer's guide:
"All IO happens on the IO thread (using Java NIO). All event callbacks happen
on the event thread. Session maintenance such as reconnecting to ZooKeeper
servers and maintaining heartbeat is done on the IO thread. Responses for
synchronous methods are also processed in the IO thread. All responses to
asynchronous methods and watch events are processed on the event thread...
Callbacks do not block the processing of the IO thread or the processing of the
synchronous calls"
bq. Thirdly, how about using the setData(breadcrumb, appData, version)?
Let me see about making this change. Like you said, it's a good safety check.
> Improve ActiveStandbyElector to provide hooks for fencing old active
> --------------------------------------------------------------------
>
> Key: HADOOP-8163
> URL: https://issues.apache.org/jira/browse/HADOOP-8163
> Project: Hadoop Common
> Issue Type: Improvement
> Components: ha
> Affects Versions: 0.23.3, 0.24.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hadoop-8163.txt, hadoop-8163.txt, hadoop-8163.txt,
> hadoop-8163.txt
>
>
> When a new node becomes active in an HA setup, it may sometimes have to take
> fencing actions against the node that was formerly active. This JIRA extends
> the ActiveStandbyElector which adds an extra non-ephemeral node into the ZK
> directory, which acts as a second copy of the active node's information.
> Then, if the active loses its ZK session, the next active to be elected may
> easily locate the unfenced node to take the appropriate actions.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira