[ 
https://issues.apache.org/jira/browse/HDFS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13186672#comment-13186672
 ] 

Todd Lipcon commented on HDFS-2681:
-----------------------------------

- Can {{ActiveStandbyElector}} be made package-private? If not, it should get 
audience annotations (perhaps private, perhaps LimitedPrivate to HDFS? not sure)
-- Same with the inner interface

----
{code}
+     * or loss of Zookeeper quorum. Thus enterSafeMode can be used to guard
+     * against split-brain issues. In such situations it might be prudent to
+     * call becomeStandby too. However, such state change operations might be
+     * expensive and enterSafeMode can help guard against doing that for
+     * transient issues.
+     */
{code}
- I think the above references to {{enterSafeMode}} are supposed to be 
{{enterNeutralMode}}, right?
- Also, it can't really guard against split brain, because there is no 
guarantee on the timeliness of delivery of these messages. That is to say, the 
other participants in the election might receive {{becomeActive}} before this 
participant receives {{enterNeutralMode}}. So, I'm not sold on the necessity of 
this callback.

----

{code}
+    void notifyFatalError();
{code}

Shouldn't this come with some kind of Exception argument or at least a String 
error message? Right now if we hit it, it won't be clear in the logs which of 
several cases caused it.

----
{code}
+  /**
+   * Name of the lock znode used by the library
+   */
+  public static final String lockFileName = 
"ActiveStandbyElectorLock-21EC2020-3AEA-1069-A2DD-08002B30309D";
{code}

- why is this public?
- should also be ALL_CAPS.
- what's with the random UUID in there? Assumedly this library would be 
configured to be rooted inside some base directory in the tree which would 
include the namespaceID, etc.

----
{code}
+   * Setting a very short session timeout may result in frequent transitions
+   * between active and standby states during issues like network outages.
{code}

Also should mention GC pauses here -- they're more frequent than network blips 
IME.
----
{code}
+   * @param zookeeperHostPort
+   *          ZooKeeper hostPort for all ZooKeeper servers
{code}

Comma-separated? Perhaps better to name it {{zookeeperHostPorts}} since there 
is more than one server in the quorum.

- typo: "reference to callback *inteface* object"

----
{code}
+    appData = new byte[data.length];
+    System.arraycopy(data, 0, appData, 0, data.length);
{code}
Could use Arrays.copyOf() here instead

----

- Rename {{operationSuccess}} etc to {{isSuccessCode}} -- I think that's a 
clearer naming.
- Make ActiveStandbyElectorTester an inner class of TestActiveStandbyElector. 
We generally discourage having multiple "outer" classes per Java file. You can 
then avoid making two "mockZk" objects, and "count" wouldn't have to be static, 
either. The whole class could be done as an anonymous class, inline, probably.
- Echo what Suresh said about catching exceptions in tests - should let it fall 
through and fail the test - that'll also make sure the exception that was 
triggered makes it all the way up to the test runner and recorded properly 
(handy when debugging in Eclipse for example)
- In a couple places, you catch an expected exception and then verify, but you 
should also add an {{Assert.fail("Didn't throw exception")}} in the {{try}} 
clause to make sure the exception was actually thrown.

----
{code}
+ * active and the rest become standbys. </br> This election mechanism is
+ * efficient for small number of election candidates (order of 10's) because
{code}
Should say "_only_ efficient" to be clear

----

{code}
+ * {@link ActiveStandbyElectorCallback} to interact with the elector
+ * 
+ */
{code}
Extra blank lines inside javadoc comments should be removed

----

Some general notes/nits:

- Some of the INFO level logs are probably better off at DEBUG level. Or else, 
they should be expanded out to more operator-readable information (most ops 
will have no clue what "CreateNode result: 2 for path: /blah/blah" means.
- Some more DEBUG level logs could be added to the different cases, or even 
INFO level ones at the unexpected ones (like having to retry, or being 
Disconnected, etc). I don't think there's any harm in being fairly verbose 
about state change events that are expected to only happen during fail-overs, 
and in case it goes wrong we want to have all the details at hand. But, as 
above, they should be operator-understandable.
- Javadoc breaks should be {{<br/>}} rather than {{</br>}}.
- Constants should be ALL_CAPS -- eg {{LOG}} rather than {{Log}
- Add a constant for NUM_RETRIES instead of hard-coded 3.
- Should never use {{e.printStackTrace}} -- instead, use {{LOG.error}} or 
{{LOG.warn}} with the second argument as the exception. This will print the 
trace, but also makes sure it goes to the right log.
                
> Add ZK client for leader election
> ---------------------------------
>
>                 Key: HDFS-2681
>                 URL: https://issues.apache.org/jira/browse/HDFS-2681
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Suresh Srinivas
>            Assignee: Bikas Saha
>             Fix For: HA branch (HDFS-1623)
>
>         Attachments: HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, 
> HDFS-2681.HDFS-1623.patch, HDFS-2681.HDFS-1623.patch, Zookeeper based Leader 
> Election and Monitoring Library.pdf
>
>
> ZKClient needs to support the following capabilities:
> # Ability to create a znode for co-ordinating leader election.
> # Ability to monitor and receive call backs when active znode status changes.
> # Ability to get information about the active node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to