[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

Vinayakumar B (JIRA) Mon, 16 Jun 2014 23:20:26 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033472#comment-14033472
 ]


Vinayakumar B commented on HDFS-6507:
-------------------------------------

Thanks [~wuzesheng] for working on this. 
Here are some of the comments on your latest patch.

1.
{code}+      inSafeMode = nn.setSafeMode(SafeModeAction.SAFEMODE_GET, 
false);{code}
{{inSafeMode}} value inside {{waitExitSafeMode(..)}} will not be reflected in 
the below print statement.
{code}+        System.out.println("Safe mode is " + (inSafeMode ? "ON" : 
"OFF"));{code}
It will always print the initial state. I feel {{waitExitSafeMode(..)}} can 
return the latest value and same can be assigned to {{inSafeMode}} in 
{{setSafeMode(..)}} method.
ex: {code}+        boolean inSafeMode = haNn.setSafeMode(action, false);
+        if (waitExitSafe) {
+          inSafeMode = waitExitSafeMode(haNn, inSafeMode);
         }
-        inSafeMode = dfs.setSafeMode(SafeModeAction.SAFEMODE_GET);
+        System.out.println("Safe mode is " + (inSafeMode ? "ON" : 
"OFF"));{code}

2. In case of HA, it will be better if we can print the NameNode address while 
printing the status of safemode. 
ex:
{code}
+        System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF") + " 
in <namenode address>");}{code}

3. {code}+        System.out.println("Save namespace successfully");{code}
  Message could be "Saved namespace successfully in <namenode address>"

4. again, same as #3, all messages for HA can include namenode address,

5. In {{metaSafe(..)}} following dfs.getUri() should be replaced with actual 
namenode address in case of HA
{code}+      for (ClientProtocol haNn : namenodes) {
+        haNn.metaSave(pathname);
+        System.out.println("Created metasave file " + pathname + " in the log 
" +
+            "directory of namenode " + dfs.getUri());
+      }{code}

6. Message could be changed as below
{code}+        System.out.println("Refresh service acl successful for <namenode 
address>");"{code}

7. Message could be changed as below
{code}+        System.out.println("Refresh user to groups mapping successful 
for <namenode address>");{code}

> Improve DFSAdmin to support HA cluster better
> ---------------------------------------------
>
>                 Key: HDFS-6507
>                 URL: https://issues.apache.org/jira/browse/HDFS-6507
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.4.0
>            Reporter: Zesheng Wu
>            Assignee: Zesheng Wu
>         Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch
>
>
> Currently, the commands supported in DFSAdmin can be classified into three 
> categories according to the protocol used:
> 1. ClientProtocol
> Commands in this category generally implement by calling the corresponding 
> function of the DFSClient class, and will call the corresponding remote 
> implementation function at the NN side finally. At the NN side, all these 
> operations are classified into five categories: UNCHECKED, READ, WRITE, 
> CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only 
> allows UNCHECKED operations. In the current implementation of DFSClient, it 
> will connect one NN first, if the first NN is not Active and the operation is 
> not allowed, it will failover to the second NN. So here comes the problem, 
> some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, 
> refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as 
> UNCHECKED operations, and when executing these commands in the DFSAdmin 
> command line, they will be sent to a definite NN, no matter it is Active or 
> Standby. This may result in two problems: 
> a. If the first tried NN is standby, and the operation takes effect only on 
> Standby NN, which is not the expected result.
> b. If the operation needs to take effect on both NN, but it takes effect on 
> only one NN. In the future, when there is a NN failover, there may have 
> problems.
> Here I propose the following improvements:
> a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL 
> operations, we should classify it clearly.
> b. If the command can not be classified as one of the above four operations, 
> or if the command needs to take effect on both NN, we should send the request 
> to both Active and Standby NNs.
> 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, 
> RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, 
> RefreshCallQueueProtocol
> Commands in this category, including refreshServiceAcl, 
> refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and 
> refreshCallQueue, are implemented by creating a corresponding RPC proxy and 
> sending the request to remote NN. In the current implementation, these 
> requests will be sent to a definite NN, no matter it is Active or Standby. 
> Here I propose that we sent these requests to both NNs.
> 3. ClientDatanodeProtocol
> Commands in this category are handled correctly, no need to improve.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6507) Improve DFSAdmin to support HA cluster better

Reply via email to