[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-24 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192840#comment-13192840
 ] 

Aaron T. Myers commented on HDFS-2681:
--

Why was this moved out from being a sub-task of HDFS-1623?

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, HDFS-2681.txt, 
 HDFS-2681.txt, Zookeeper based Leader Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-16 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187149#comment-13187149
 ] 

Bikas Saha commented on HDFS-2681:
--

Can ActiveStandbyElector be made package-private?
Let me read up on all these package annotations and see what makes sense.

Also, it can't really guard against split brain
The guarantee is based on setting the correct timeouts. Another instance can 
only become active after the session timeout. The session timeout is 
recommended to be at least 3X the zookeeper disconnect timeout. 
enterNeutralMode is called when zookeeper client disconnects from zookeeper 
server or when zookeeper servers lose quorum. My understanding is that when 
there is a network disconnection then zookeeper client will disconnect from the 
server and post a disconnect event. So if your TCP disconnect timeouts are not 
set insanely high ( session timeout) then enterSafeMode will be called before 
session timeout expires and someone else becomes a master. Does this clarify?

should also be ALL_CAPS
It public because its a well defined property of the class.
Is the ALLCAPS on static strings a convention? You mean the member name should 
be all caps or the value? I added the random UID to prevent accidental 
operation on this file from some admin. It does not hurt and it safer than 
using just a nicely named file. Anyways, I changed it.

Could use Arrays.copyOf()
I first used .clone() and then was pointed to System.ArrayCopy() and now 
pointed to Array.copyOf(). Could you please point me to any place that lists 
the pros and cons of different array copying methods (of which there seem to be 
many)?

Rename operationSuccess etc to isSuccessCode
I think the current names read OK with the if() statements.

Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector.
I first wrote it that way. But there is a problem. 
Tester_constructor()-super_constructor()-Tester().getNewZookeeper()-returns 
mock.
So I need to have mock initialized before constructing the tester object. So I 
made mock a static member. But then java complained that inner classes cannot 
have static members.

Some of the INFO level logs are probably better off at DEBUG level.
Could you please point me to some place which explains what to log at different 
log levels?

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader 
 Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187168#comment-13187168
 ] 

Todd Lipcon commented on HDFS-2681:
---

bq. So if your TCP disconnect timeouts are not set insanely high ( session 
timeout) then enterSafeMode will be called before session timeout expires and 
someone else becomes a master.

This still isn't safe. For example, imagine the NN goes into a multi-minute 
GC pause just before writing an edit to its edit log. Since the GC pause is 
longer than the session timeout, some other NN will take over. Without active 
fencing, when the first NN wakes up, it will make that mutation to the edit log 
before it finds out about the ZK timeout.

It sounds contrived but we've had many instances of data loss bugs in HBase due 
to scenarios like this in the past. Multi-minute GC pauses are rare but do 
happen.

bq. It public because its a well defined property of the class.
But it implies that external consumers of this class may want to directly 
manipulate the znode -- which is exposing an implementation detail 
unnecessarily.

bq. Is the ALLCAPS on static strings a convention? You mean the member name 
should be all caps or the value?

Yes, it's a convention that constants should have all-caps names. See the Sun 
java coding conventions, which we more-or-less follow: 
http://www.oracle.com/technetwork/java/codeconventions-135099.html#367

bq. So I need to have mock initialized before constructing the tester object. 
So I made mock a static member. But then java complained that inner classes 
cannot have static members.
I'm not quite following - you already initialize the non-static {{mockZk}} in 
{{TestActiveStandbyElector.init()}}?. Then if it's a non-static inner class, it 
can simply refer to the already-initialized member of its outer class.

bq. Could you please point me to some place which explains what to log at 
different log levels?
I don't think we have any formal guidelines here.. the basic assumptions I make 
are:
- ERROR: unrecoverable errors (eg some block is apparently lost, or a failover 
failed, etc)
- WARN: recoverable errors (eg failures that will be retried, blocks that have 
become under-replicated but can be repaired, etc)
- INFO: normal operations proceeding as expected, but interesting enough that 
operators will want to see it.
- DEBUG: information that will be useful to developers debugging unit tests or 
running small test clusters (unit tests generally enable these, but users 
generally don't). Also handy when you have a reproducible bug on the client - 
you can ask the user to enable DEBUG and re-run, for example.
- TRACE: super-detailed trace information that will only be enabled in rare 
circumstances. We don't use this much.



 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader 
 Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-16 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187186#comment-13187186
 ] 

Bikas Saha commented on HDFS-2681:
--


About the GC pause scenario (and others like it). Lets not mix up election with 
operation safety. What this library provides is a signal about whether one is a 
leader or not. By itself, that does not solve the problems of whether that 
signal was properly processed or not. E.g. a potential solution to the GC pause 
(or any NN hung case) would be to not have the NN participate in leader 
election directly. A failover controller (whose design ensures 0 or cheap GC 
pauses) could handle the leader election and terminate hung NN's when its are 
no longer a master. 

Let me address some of the comments in a subsequent patch. I need to learn a 
little more Java before I can do it to my liking.


 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader 
 Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-15 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186672#comment-13186672
 ] 

Todd Lipcon commented on HDFS-2681:
---

- Can {{ActiveStandbyElector}} be made package-private? If not, it should get 
audience annotations (perhaps private, perhaps LimitedPrivate to HDFS? not sure)
-- Same with the inner interface


{code}
+ * or loss of Zookeeper quorum. Thus enterSafeMode can be used to guard
+ * against split-brain issues. In such situations it might be prudent to
+ * call becomeStandby too. However, such state change operations might be
+ * expensive and enterSafeMode can help guard against doing that for
+ * transient issues.
+ */
{code}
- I think the above references to {{enterSafeMode}} are supposed to be 
{{enterNeutralMode}}, right?
- Also, it can't really guard against split brain, because there is no 
guarantee on the timeliness of delivery of these messages. That is to say, the 
other participants in the election might receive {{becomeActive}} before this 
participant receives {{enterNeutralMode}}. So, I'm not sold on the necessity of 
this callback.



{code}
+void notifyFatalError();
{code}

Shouldn't this come with some kind of Exception argument or at least a String 
error message? Right now if we hit it, it won't be clear in the logs which of 
several cases caused it.


{code}
+  /**
+   * Name of the lock znode used by the library
+   */
+  public static final String lockFileName = 
ActiveStandbyElectorLock-21EC2020-3AEA-1069-A2DD-08002B30309D;
{code}

- why is this public?
- should also be ALL_CAPS.
- what's with the random UUID in there? Assumedly this library would be 
configured to be rooted inside some base directory in the tree which would 
include the namespaceID, etc.


{code}
+   * Setting a very short session timeout may result in frequent transitions
+   * between active and standby states during issues like network outages.
{code}

Also should mention GC pauses here -- they're more frequent than network blips 
IME.

{code}
+   * @param zookeeperHostPort
+   *  ZooKeeper hostPort for all ZooKeeper servers
{code}

Comma-separated? Perhaps better to name it {{zookeeperHostPorts}} since there 
is more than one server in the quorum.

- typo: reference to callback *inteface* object


{code}
+appData = new byte[data.length];
+System.arraycopy(data, 0, appData, 0, data.length);
{code}
Could use Arrays.copyOf() here instead



- Rename {{operationSuccess}} etc to {{isSuccessCode}} -- I think that's a 
clearer naming.
- Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector. 
We generally discourage having multiple outer classes per Java file. You can 
then avoid making two mockZk objects, and count wouldn't have to be static, 
either. The whole class could be done as an anonymous class, inline, probably.
- Echo what Suresh said about catching exceptions in tests - should let it fall 
through and fail the test - that'll also make sure the exception that was 
triggered makes it all the way up to the test runner and recorded properly 
(handy when debugging in Eclipse for example)
- In a couple places, you catch an expected exception and then verify, but you 
should also add an {{Assert.fail(Didn't throw exception)}} in the {{try}} 
clause to make sure the exception was actually thrown.


{code}
+ * active and the rest become standbys. /br This election mechanism is
+ * efficient for small number of election candidates (order of 10's) because
{code}
Should say _only_ efficient to be clear



{code}
+ * {@link ActiveStandbyElectorCallback} to interact with the elector
+ * 
+ */
{code}
Extra blank lines inside javadoc comments should be removed



Some general notes/nits:

- Some of the INFO level logs are probably better off at DEBUG level. Or else, 
they should be expanded out to more operator-readable information (most ops 
will have no clue what CreateNode result: 2 for path: /blah/blah means.
- Some more DEBUG level logs could be added to the different cases, or even 
INFO level ones at the unexpected ones (like having to retry, or being 
Disconnected, etc). I don't think there's any harm in being fairly verbose 
about state change events that are expected to only happen during fail-overs, 
and in case it goes wrong we want to have all the details at hand. But, as 
above, they should be operator-understandable.
- Javadoc breaks should be {{br/}} rather than {{/br}}.
- Constants should be ALL_CAPS -- eg {{LOG}} rather than {{Log}
- Add a constant for NUM_RETRIES instead of hard-coded 3.
- Should never use {{e.printStackTrace}} -- instead, use {{LOG.error}} or 
{{LOG.warn}} with the second argument as the exception. This will print the 
trace, but also makes sure it goes to the right log.

 Add ZK client for leader election
 

[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-13 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185757#comment-13185757
 ] 

Bikas Saha commented on HDFS-2681:
--

 General - How is multithreaded use for this library handled?
All public methods are synchronized

Add javadoc to the class on how to use the class. Also callback interface can 
be described in more detail on when the call back is made and perhaps some 
description of what is expected of the app. notifyError() particularly needs 
better documentation on when to expect this callback.
Please document the enumerations.
Done in second patch

Constructor should check for null - at least for call back passed. Otherwise 
you will get null pointer exception.
Done

joinElection() you may want to copy the byte[] data passed or at least 
document that the data[] must not be changed by the caller.
Done

#getNewZooKeeper() seems unnecessary and can be removed. Creation of 
ZooKeeper() can be moved to createConnection() it self.
This is to pass in a mock zookeeper for testing

Make member variable that are initialized only once in the constructor final.
Done in second patch

activeData could be better name for appData.
All app's can pass in data (which may go into future per app nodes). Only 
active app's data makes it to the lock. So I think the name is good.

Please check if all the params are documented in methods. For example 
constructor is missing one of the params in the doc. Same is true with 
exceptions thrown.
Done in second patch

quitElection() should not check zkClient non null, as terminateConnection 
already checks it.
Yeah. I forgot to remove that check after I refactored stuff into the reset() 
method

getActiveData() - how about not throwing KeeperException? Also 
ActiveNotFoundException should wrap the exception caught from ZK.
Its hard to differentiate exceptions inside KeeperException. There is not much 
the elector can do about them. The only commonly expected exception would be 
getting leader data when no leader exists and that has been handled as part of 
the elector API via a new exception.


 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 Zookeeper based Leader Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-13 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186026#comment-13186026
 ] 

Suresh Srinivas commented on HDFS-2681:
---

Really nice job with the tests. May be you can also create a real ZK based 
test. This could be done in another jira.

Here are the comments:
Comments:
# Please use System.arraycopy() instead of byte[] clone.
# Split process into two different methods processZkEvent and processZnodeEvent?

# Test
# Change the method name to init(). Annotate it @Before. It will be 
automatically called before tests.
# Use @Expected for tests that expect exception
# Add class level javadoc.
# #Init need not catch IOException. Just throw it. The test will fail.
# You can reduce several lines of code by using a static byte[] DATA;
# can you add test where jointElection() is called twice and the second call is 
NO-OP
# Many times where processResult is called back to back can be in for loop
# Why should 4 errors of connection loss result in fatalError?
# testStatNodeError already covers some part of 
testCreateNodeResultRetryBecomeActive
# Instead of catching InterruptedException, you can just throw it


 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, Zookeeper based Leader Election and Monitoring 
 Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-13 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186074#comment-13186074
 ] 

Bikas Saha commented on HDFS-2681:
--

Some more comments from chat session
# Change the method name to init(). Annotate it @Before. It will be 
automatically called before tests.
# Use @Expected for tests that expect exception
Done
# Add class level javadoc.
This is already in the second patch. If by this you mean comments before the 
class declaration.
# #Init need not catch IOException. Just throw it. The test will fail.
I know it wont throw the exception in the test. But it has to be handled to 
keep the compiler happy. So I handled it locally there instead of adding 
throws IOException in every method of the test
# You can reduce several lines of code by using a static byte[] DATA;
done
# can you add test where jointElection() is called twice and the second call is 
NO-OP
# Many times where processResult is called back to back can be in for loop
this helps me walk the scenarios better than in a loop
# Why should 4 errors of connection loss result in fatalError?
Because the elector has tried its best to connect to Zookeeper and failed. We 
can revisit this based observed failures at a later time.
# testStatNodeError already covers some part of 
testCreateNodeResultRetryBecomeActive
yes. thats because it is trying to walk through a logical scenario. so I let it 
be.
# Instead of catching InterruptedException, you can just throw it
same code cleanliness as above. I know this exception will not get thrown in 
the test. so want to make local changes to keep the compiler happy.

Please use System.arraycopy() instead of byte[] clone.
done.
Split process into two different methods processZkEvent and processZnodeEvent?
The function is still small enough to let it be. Will do this later when more 
logic might get added if we do group participation. At that point 
processZnodeEvent itself will need division into lock znode and parent znode.

can you add test where jointElection() is called twice and the second call is 
NO-OP
it was there in test processResult callback but got changed to enterNeutralMode 
when I changed that test. now I enhanced testCreateNodeResultBecomeActive() to 
check that there is no double master call and added another test to check that 
there is no double slave call for expected scenarios. now all 3 states are 
covered.

Will upload the patch with all these changes.
Thanks


 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 HDFS-2681.HDFS-1623.patch, Zookeeper based Leader Election and Monitoring 
 Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-12 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185358#comment-13185358
 ] 

Suresh Srinivas commented on HDFS-2681:
---

Some comments for the first version of the patch:
# General - How is multithreaded use for this library handled?
# ActiveStandbyElector
#* Add javadoc to the class on how to use the class. Also callback interface 
can be described in more detail on when the call back is made and perhaps some 
description of what is expected of the app. notifyError() particularly needs 
better documentation on when to expect this callback.
#* Please document the enumerations.
#* Constructor should check for null - at least for call back passed. Otherwise 
you will get null pointer exception.
#* joinElection() you may want to copy the byte[] data passed or at least 
document that the data[] must not be changed by the caller.
#* #getNewZooKeeper() seems unnecessary and can be removed. Creation of 
ZooKeeper() can be moved to createConnection() it self.
#* Make member variable that are initialized only once in the constructor final.
#* activeData could be better name for appData.
#* Please check if all the params are documented in methods. For example 
constructor is missing one of the params in the doc. Same is true with 
exceptions thrown.
#* quitElection() should not check zkClient non null, as terminateConnection 
already checks it.
#* getActiveData() - how about not throwing KeeperException? Also 
ActiveNotFoundException should wrap the exception caught from ZK.

I have not complete the review. These are some prelimiary comments


 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Bikas Saha
 Fix For: HA branch (HDFS-1623)

 Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
 Zookeeper based Leader Election and Monitoring Library.pdf


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-03 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179175#comment-13179175
 ] 

Suresh Srinivas commented on HDFS-2681:
---

Todd, the goal of this jira is to write a library that could be used by 
failover controller.

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: HA branch (HDFS-1623)


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-03 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179177#comment-13179177
 ] 

Suresh Srinivas commented on HDFS-2681:
---

I found that ZK-1080 added support that I intended to add this. Will take a 
look at it as well.

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: HA branch (HDFS-1623)


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2012-01-03 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13179180#comment-13179180
 ] 

Aaron T. Myers commented on HDFS-2681:
--

Hey Suresh, you might also check out ZOOKEEPER-1095, as ZK-1080 hasn't been 
committed yet.

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: HA branch (HDFS-1623)


 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2681) Add ZK client for leader election

2011-12-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169561#comment-13169561
 ] 

Todd Lipcon commented on HDFS-2681:
---

Hey Suresh. I've been thinking about the design for the ZK-based failover 
controller but have yet to post a design. Let me write something up and post it 
to HDFS-2185 today. It sounds like we should coordinate work.

 Add ZK client for leader election
 -

 Key: HDFS-2681
 URL: https://issues.apache.org/jira/browse/HDFS-2681
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HA branch (HDFS-1623)
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas

 ZKClient needs to support the following capabilities:
 # Ability to create a znode for co-ordinating leader election.
 # Ability to monitor and receive call backs when active znode status changes.
 # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira