[
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049501#comment-13049501
]
E. Sammer commented on ZOOKEEPER-1080:
--------------------------------------
Henry:
{quote}
I wouldn't try and delete the root node at STOP time. It seems prone to
problems if you stop one node while others are starting / in a failed state and
don't have ephemerals yet registered. Sequence numbers are a fairly abundant
resource, and if it's possible to run out of them across several runs, it's
definitely possible to run out of them in a single run.
{quote}
While I think it's cleaner and I'd rather just handle the cases properly, I
don't feel strongly about it. I'll just pull it out. Any thoughts on ACLs? I
just punted on the subject of security entirely (which is probably not
sufficient long term).
{quote}
That tuple support class is, imho, kinda gross. It would be clearer to use
specific struct-type classes whose names correspond to the fields they're
intended to hold.
{quote}
As I mentioned to you (in person) I really wanted something like Map.Entry<K,
V> but public. I tried to make it look like Scala's tuples so there was at
least a hint of prior art. I can replace it with a LeaderOffer class with two
elements if it makes it seem nicer, though.
{quote}
'Observers' is already a meaningful noun in ZK land, so it might be clearer to
call them something else. Paxos uses Learners, but that's also taken inside ZK.
Listeners?
{quote}
No problem.
{quote}
Not a big deal, but I think you can break out of the for loop at the end of
determineElectionStatus once the offer corresponding to the local node has been
found.
{quote}
I'll take a look. Sounds good.
{quote}
I think addObserver / removeObserver probably need to synchronize on observers
if you think you need to sync in dispatchEvent as well.
{quote}
You're right; missed the sync on those.
{quote}
Is there any way to actually determine who the leader is (if not the local
process)? Seems like this would be useful.
{quote}
Currently no, but that's a known missing feature. What's your thought on the
addition of a $ROOT/current_master with a data payload? I talked to ATM about
it and we agreed having $ROOT/$machine_name is bad because one can't stat for a
known znode from other hosts. I have a few concerns on this.
* I don't want people doing goofy things like watching this node and
accidentally creating a stampede, ignoring the seq nodes.
* There is a potential pathological case where node A is elected master and
can't create the $ROOT/current_master node.
The latter is probably the one I'm more concerned about. The other option is to
shove host name info into the seq znodes themselves but then they become
mutable and we start triggering watches on updates which is kind of crufty.
Hari:
{quote}
How about handling of "Disconnected" and "Expired" events from Zookeeper?
Consider a scenario where
Active and Standby node is running
Active node`s network failed. It is "Disconnected" from Zookeeper. But it will
still behave as Active as it is not getting any events from the framework.
Meanwhile Standby gets the "Node Deleted" event and becomes Active
{quote}
I purposefully didn't handle this case because I felt it was obvious. If you
get disconnected from ZK (abnormally) you'll receive an exception and run any
code internally to relinquish leader status. It doesn't make sense to worry
about this case too much because you'll also have a fencing problem which you
can't ever detect from the pathological node (see below).
{quote}
When event is dispatched to the Observer, say "elected as leader", observer may
want to start its process as leader. It may not be able to start it due to some
reason and throws Exception. What happens next? Whether this error is
propogated and given other node a chance to become leader ?
{quote}
A good question. This is something one can detect internally in an obvious way
as well. If you fail to "start" when you find out you've been elected leader,
you should relinquish leader status (i.e. stop()). This is a pretty straight
forward case.
{quote}
Here, we first dispatch the event and then adding watch on the leader. If the
dispatchEvent failed (may be starting as Standby failed), then how do we
continue the state of the node?
{quote}
This is also a straight forward case. If dispatchEvent fails, the failure
should be trapped in the observer / handler code and the leader status
relinquished. It doesn't matter if we go on to set a watch; we're going to give
up on being the leader and start the process over anyway.
I purposefully decided not to handle a lot of failure cases because they're
obvious and plentiful. In the event of a detectable failure (e.g. an exception
in dispatchEvent()) the node *knows* it's misbehaving and can handle that. The
dangerous cases are when a node is misbehaving and *doesn't* know it. Those
cases are so many that there is no way to handle them all and is why we also
need fencing strategies and STONITH.
A trivial evil case is that node A is elected leader and blocks indefinitely in
dispatchEvent() and thus never fully starts services. Another is that a node in
a ready state (next in line for the leader position) receives a watch event
when the leader dies and blocks indefinitely but doesn't lose its connection to
ZK thus causing a DOS. My point is that a truly generic, HA service will always
require application specific functionality to understand the difference between
alive and dead and know how to react in those cases. What I've tried to do here
(and believe is the correct approach) is to simply create the proper
infrastructure to allow systems developers to hook into a simple library and
build whatever checks or guarantees they wish.
> Provide a Leader Election framework based on Zookeeper receipe
> --------------------------------------------------------------
>
> Key: ZOOKEEPER-1080
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
> Project: ZooKeeper
> Issue Type: New Feature
> Components: contrib
> Affects Versions: 3.3.2
> Reporter: Hari A V
> Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch,
> zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz
>
>
> Currently Hadoop components such as NameNode and JobTracker are single point
> of failure.
> If Namenode or JobTracker goes down, there service will not be available
> until they are up and running again. If there was a Standby Namenode or
> JobTracker available and ready to serve when Active nodes go down, we could
> have reduced the service down time. Hadoop already provides a Standby
> Namenode implementation which is not fully a "hot" Standby.
> The common problem to be addressed in any such Active-Standby cluster is
> Leader Election and Failure detection. This can be done using Zookeeper as
> mentioned in the Zookeeper recipes.
> http://zookeeper.apache.org/doc/r3.3.3/recipes.html
> +Leader Election Service (LES)+
> Any Node who wants to participate in Leader Election can use this service.
> They should start the service with required configurations. The service will
> notify the nodes whether they should be started as Active or Standby mode.
> Also they intimate any changes in the mode at runtime. All other complexities
> can be handled internally by the LES.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira