[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

E. Sammer (JIRA) Tue, 14 Jun 2011 16:09:55 -0700

    [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049501#comment-13049501
 ]


E. Sammer commented on ZOOKEEPER-1080:
--------------------------------------

Henry:

{quote}
I wouldn't try and delete the root node at STOP time. It seems prone to 
problems if you stop one node while others are starting / in a failed state and 
don't have ephemerals yet registered. Sequence numbers are a fairly abundant 
resource, and if it's possible to run out of them across several runs, it's 
definitely possible to run out of them in a single run.
{quote}

While I think it's cleaner and I'd rather just handle the cases properly, I 
don't feel strongly about it. I'll just pull it out. Any thoughts on ACLs? I 
just punted on the subject of security entirely (which is probably not 
sufficient long term).

{quote}
That tuple support class is, imho, kinda gross. It would be clearer to use 
specific struct-type classes whose names correspond to the fields they're 
intended to hold.
{quote}

As I mentioned to you (in person) I really wanted something like Map.Entry<K, 
V> but public. I tried to make it look like Scala's tuples so there was at 
least a hint of prior art. I can replace it with a LeaderOffer class with two 
elements if it makes it seem nicer, though.

{quote}
'Observers' is already a meaningful noun in ZK land, so it might be clearer to 
call them something else. Paxos uses Learners, but that's also taken inside ZK. 
Listeners?
{quote}

No problem.

{quote}
Not a big deal, but I think you can break out of the for loop at the end of 
determineElectionStatus once the offer corresponding to the local node has been 
found.
{quote}

I'll take a look. Sounds good.

{quote}
I think addObserver / removeObserver probably need to synchronize on observers 
if you think you need to sync in dispatchEvent as well.
{quote}

You're right; missed the sync on those.

{quote}
Is there any way to actually determine who the leader is (if not the local 
process)? Seems like this would be useful.
{quote}

Currently no, but that's a known missing feature. What's your thought on the 
addition of a $ROOT/current_master with a data payload? I talked to ATM about 
it and we agreed having $ROOT/$machine_name is bad because one can't stat for a 
known znode from other hosts. I have a few concerns on this.

* I don't want people doing goofy things like watching this node and 
accidentally creating a stampede, ignoring the seq nodes.
* There is a potential pathological case where node A is elected master and 
can't create the $ROOT/current_master node.

The latter is probably the one I'm more concerned about. The other option is to 
shove host name info into the seq znodes themselves but then they become 
mutable and we start triggering watches on updates which is kind of crufty.

Hari:

{quote}
How about handling of "Disconnected" and "Expired" events from Zookeeper?

Consider a scenario where
Active and Standby node is running
Active node`s network failed. It is "Disconnected" from Zookeeper. But it will 
still behave as Active as it is not getting any events from the framework.
Meanwhile Standby gets the "Node Deleted" event and becomes Active
{quote}

I purposefully didn't handle this case because I felt it was obvious. If you 
get disconnected from ZK (abnormally) you'll receive an exception and run any 
code internally to relinquish leader status. It doesn't make sense to worry 
about this case too much because you'll also have a fencing problem which you 
can't ever detect from the pathological node (see below).

{quote}
When event is dispatched to the Observer, say "elected as leader", observer may 
want to start its process as leader. It may not be able to start it due to some 
reason and throws Exception. What happens next? Whether this error is 
propogated and given other node a chance to become leader ?
{quote}

A good question. This is something one can detect internally in an obvious way 
as well. If you fail to "start" when you find out you've been elected leader, 
you should relinquish leader status (i.e. stop()). This is a pretty straight 
forward case.

{quote}
Here, we first dispatch the event and then adding watch on the leader. If the 
dispatchEvent failed (may be starting as Standby failed), then how do we 
continue the state of the node?
{quote}

This is also a straight forward case. If dispatchEvent fails, the failure 
should be trapped in the observer / handler code and the leader status 
relinquished. It doesn't matter if we go on to set a watch; we're going to give 
up on being the leader and start the process over anyway.

I purposefully decided not to handle a lot of failure cases because they're 
obvious and plentiful. In the event of a detectable failure (e.g. an exception 
in dispatchEvent()) the node *knows* it's misbehaving and can handle that. The 
dangerous cases are when a node is misbehaving and *doesn't* know it. Those 
cases are so many that there is no way to handle them all and is why we also 
need fencing strategies and STONITH.

A trivial evil case is that node A is elected leader and blocks indefinitely in 
dispatchEvent() and thus never fully starts services. Another is that a node in 
a ready state (next in line for the leader position) receives a watch event 
when the leader dies and blocks indefinitely but doesn't lose its connection to 
ZK thus causing a DOS. My point is that a truly generic, HA service will always 
require application specific functionality to understand the difference between 
alive and dead and know how to react in those cases. What I've tried to do here 
(and believe is the correct approach) is to simply create the proper 
infrastructure to allow systems developers to hook into a simple library and 
build whatever checks or guarantees they wish.

> Provide a Leader Election framework based on Zookeeper receipe
> --------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1080
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1080
>             Project: ZooKeeper
>          Issue Type: New Feature
>          Components: contrib
>    Affects Versions: 3.3.2
>            Reporter: Hari A V
>         Attachments: LeaderElectionService.pdf, ZOOKEEPER-1080.patch, 
> zkclient-0.1.0.jar, zookeeper-leader-0.0.1.tar.gz
>
>
> Currently Hadoop components such as NameNode and JobTracker are single point 
> of failure.
> If Namenode or JobTracker goes down, there service will not be available 
> until they are up and running again. If there was a Standby Namenode or 
> JobTracker available and ready to serve when Active nodes go down, we could 
> have reduced the service down time. Hadoop already provides a Standby 
> Namenode implementation which is not fully a "hot" Standby. 
> The common problem to be addressed in any such Active-Standby cluster is 
> Leader Election and Failure detection. This can be done using Zookeeper as 
> mentioned in the Zookeeper recipes.
> http://zookeeper.apache.org/doc/r3.3.3/recipes.html
> +Leader Election Service (LES)+
> Any Node who wants to participate in Leader Election can use this service. 
> They should start the service with required configurations. The service will 
> notify the nodes whether they should be started as Active or Standby mode. 
> Also they intimate any changes in the mode at runtime. All other complexities 
> can be handled internally by the LES.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (ZOOKEEPER-1080) Provide a Leader Election framework based on Zookeeper receipe

Reply via email to