[ https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901848#comment-13901848 ]
Yongjun Zhang commented on HDFS-4239: ------------------------------------- HI Jimmy, Thanks for the good work. I went through patch v4. It looks good to me. I only had a few comments, mostly cosmetic things and I may be wrong myself. 1. In DataNode.java: private void checkSuperuserPrivilege(String method) throws IOException { if (checkKerberosAuthMethod(method)) { ... } } The above function check super privilege only when kerberos authentication is enabled. This seems not restrictive enough to me. However, I saw existing code in same file also does that, such as: private void checkBlockLocalPathAccess() throws IOException { checkKerberosAuthMethod("getBlockLocalPathInfo()"); ... } So I'm not actually not sure. Please correct me if I'm wrong. Say, I found some other existing code that checks superuser privilege like ./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java public void checkSuperuserPrivilege() Which seems to do thing differently. 2. In DataNode.java: /** Ensure that authentication method is kerberos */ boolean checkKerberosAuthMethod(String msg) throws IOException { Suggest to change (both comments and interface) to something like: /** Check whether authentication method is kerberos, return true * if so and false otherwise */ boolean isKerberosAuthMethodEnabled(...)... 3. In BlockPoolSliceScanner.java private static final String VERIFICATION_PREFIX = "dncp_block_verification.log"; You removed "private" from the interface, I wonder if it's what you intended. Seems it should stay private. 4. In DatablockScanner.java: void volumeMarkedDown(FsVolumeSpi vol) throws IOException { I wonder whether if we can change it to /** * relocate verification logs for volume that's marked down * ... */ void relocateVerificationLogs(FsVolumeSpi volMarkedDown) ... to make it more clear? 5. In BlockPoolSliceScanner.java, void relocateVerificationLogs(FsVolumeSpi vol) throws IOException { if (verificationLog != null) { // block of code } // no code here } If the block of code is large, it would be helpful to change it to void relocateVerificationLogs(FsVolumeSpi vol) throws IOException { if (verificationLog == null) { return; } // block of code } This helps removing one level of indentation, to make it easier to read. Thanks. > Means of telling the datanode to stop using a sick disk > ------------------------------------------------------- > > Key: HDFS-4239 > URL: https://issues.apache.org/jira/browse/HDFS-4239 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: stack > Assignee: Jimmy Xiang > Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, > hdfs-4239_v4.patch, hdfs-4239_v5.patch > > > If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing > occasionally, or just exhibiting high latency -- your choices are: > 1. Decommission the total datanode. If the datanode is carrying 6 or 12 > disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- > the rereplication of the downed datanode's data can be pretty disruptive, > especially if the cluster is doing low latency serving: e.g. hosting an hbase > cluster. > 2. Stop the datanode, unmount the bad disk, and restart the datanode (You > can't unmount the disk while it is in use). This latter is better in that > only the bad disk's data is rereplicated, not all datanode data. > Is it possible to do better, say, send the datanode a signal to tell it stop > using a disk an operator has designated 'bad'. This would be like option #2 > above minus the need to stop and restart the datanode. Ideally the disk > would become unmountable after a while. > Nice to have would be being able to tell the datanode to restart using a disk > after its been replaced. -- This message was sent by Atlassian JIRA (v6.1.5#6160)