[ https://issues.apache.org/jira/browse/HDFS-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033472#comment-14033472 ]
Vinayakumar B commented on HDFS-6507: ------------------------------------- Thanks [~wuzesheng] for working on this. Here are some of the comments on your latest patch. 1. {code}+ inSafeMode = nn.setSafeMode(SafeModeAction.SAFEMODE_GET, false);{code} {{inSafeMode}} value inside {{waitExitSafeMode(..)}} will not be reflected in the below print statement. {code}+ System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF"));{code} It will always print the initial state. I feel {{waitExitSafeMode(..)}} can return the latest value and same can be assigned to {{inSafeMode}} in {{setSafeMode(..)}} method. ex: {code}+ boolean inSafeMode = haNn.setSafeMode(action, false); + if (waitExitSafe) { + inSafeMode = waitExitSafeMode(haNn, inSafeMode); } - inSafeMode = dfs.setSafeMode(SafeModeAction.SAFEMODE_GET); + System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF"));{code} 2. In case of HA, it will be better if we can print the NameNode address while printing the status of safemode. ex: {code} + System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF") + " in <namenode address>");}{code} 3. {code}+ System.out.println("Save namespace successfully");{code} Message could be "Saved namespace successfully in <namenode address>" 4. again, same as #3, all messages for HA can include namenode address, 5. In {{metaSafe(..)}} following dfs.getUri() should be replaced with actual namenode address in case of HA {code}+ for (ClientProtocol haNn : namenodes) { + haNn.metaSave(pathname); + System.out.println("Created metasave file " + pathname + " in the log " + + "directory of namenode " + dfs.getUri()); + }{code} 6. Message could be changed as below {code}+ System.out.println("Refresh service acl successful for <namenode address>");"{code} 7. Message could be changed as below {code}+ System.out.println("Refresh user to groups mapping successful for <namenode address>");{code} > Improve DFSAdmin to support HA cluster better > --------------------------------------------- > > Key: HDFS-6507 > URL: https://issues.apache.org/jira/browse/HDFS-6507 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools > Affects Versions: 2.4.0 > Reporter: Zesheng Wu > Assignee: Zesheng Wu > Attachments: HDFS-6507.1.patch, HDFS-6507.2.patch > > > Currently, the commands supported in DFSAdmin can be classified into three > categories according to the protocol used: > 1. ClientProtocol > Commands in this category generally implement by calling the corresponding > function of the DFSClient class, and will call the corresponding remote > implementation function at the NN side finally. At the NN side, all these > operations are classified into five categories: UNCHECKED, READ, WRITE, > CHECKPOINT, JOURNAL. Active NN will allow all operations, and Standby NN only > allows UNCHECKED operations. In the current implementation of DFSClient, it > will connect one NN first, if the first NN is not Active and the operation is > not allowed, it will failover to the second NN. So here comes the problem, > some of the commands(setSafeMode, saveNameSpace, restoreFailedStorage, > refreshNodes, setBalancerBandwidth, metaSave) in DFSAdmin are classified as > UNCHECKED operations, and when executing these commands in the DFSAdmin > command line, they will be sent to a definite NN, no matter it is Active or > Standby. This may result in two problems: > a. If the first tried NN is standby, and the operation takes effect only on > Standby NN, which is not the expected result. > b. If the operation needs to take effect on both NN, but it takes effect on > only one NN. In the future, when there is a NN failover, there may have > problems. > Here I propose the following improvements: > a. If the command can be classified as one of READ/WRITE/CHECKPOINT/JOURNAL > operations, we should classify it clearly. > b. If the command can not be classified as one of the above four operations, > or if the command needs to take effect on both NN, we should send the request > to both Active and Standby NNs. > 2. Refresh protocols: RefreshAuthorizationPolicyProtocol, > RefreshUserMappingsProtocol, RefreshUserMappingsProtocol, > RefreshCallQueueProtocol > Commands in this category, including refreshServiceAcl, > refreshUserToGroupMapping, refreshSuperUserGroupsConfiguration and > refreshCallQueue, are implemented by creating a corresponding RPC proxy and > sending the request to remote NN. In the current implementation, these > requests will be sent to a definite NN, no matter it is Active or Standby. > Here I propose that we sent these requests to both NNs. > 3. ClientDatanodeProtocol > Commands in this category are handled correctly, no need to improve. -- This message was sent by Atlassian JIRA (v6.2#6252)