[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

Yongjun Zhang (JIRA) Fri, 14 Feb 2014 11:45:34 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901848#comment-13901848
 ]


Yongjun Zhang commented on HDFS-4239:
-------------------------------------

HI Jimmy, 

Thanks for the good work. I went through patch v4. It looks good to me.  I only 
had a few comments, mostly cosmetic things and I may be wrong myself. 

1. In DataNode.java:
 private void checkSuperuserPrivilege(String method) throws IOException {
    if (checkKerberosAuthMethod(method)) {
      ...
    }
  }

The above function check super privilege only when kerberos authentication
is enabled. This seems not restrictive enough to me.  However, I saw existing 
code in same file also does that, such as:

  private void checkBlockLocalPathAccess() throws IOException {
    checkKerberosAuthMethod("getBlockLocalPathInfo()");
    ...
  }

So I'm not actually not sure. Please correct me if I'm wrong. Say, I found some 
other existing code that checks superuser privilege like

./hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
  public void checkSuperuserPrivilege()

Which seems to do thing differently.

2. In DataNode.java:

 /** Ensure that authentication method is kerberos */
 boolean checkKerberosAuthMethod(String msg) throws IOException {

Suggest to change  (both comments and interface) to something like:
 /** Check whether authentication method is kerberos, return true 
   * if so and false otherwise 
   */
 boolean isKerberosAuthMethodEnabled(...)...
    

3. In BlockPoolSliceScanner.java
  private static final String VERIFICATION_PREFIX = 
"dncp_block_verification.log";

  You removed "private" from the interface, I wonder if it's what you intended. 
  Seems it should stay private.

4. In DatablockScanner.java:

  void volumeMarkedDown(FsVolumeSpi vol) throws IOException {

I wonder whether if we can change it to 
  /**
   * relocate verification logs for volume that's marked down
   * ...
   */ 
  void relocateVerificationLogs(FsVolumeSpi volMarkedDown) ...

to make it more clear?

5.  In BlockPoolSliceScanner.java,
  void relocateVerificationLogs(FsVolumeSpi vol) throws IOException {
     if (verificationLog != null) {
       // block of code
     }
     // no code here
   }

If the block of code is large, it would be helpful to change
it to
  void relocateVerificationLogs(FsVolumeSpi vol) throws IOException {
    if (verificationLog == null) {
      return;
    }
    // block of code
  }

This helps removing one level of indentation, to make it easier to read.

Thanks.



> Means of telling the datanode to stop using a sick disk
> -------------------------------------------------------
>
>                 Key: HDFS-4239
>                 URL: https://issues.apache.org/jira/browse/HDFS-4239
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jimmy Xiang
>         Attachments: hdfs-4239.patch, hdfs-4239_v2.patch, hdfs-4239_v3.patch, 
> hdfs-4239_v4.patch, hdfs-4239_v5.patch
>
>
> If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
> occasionally, or just exhibiting high latency -- your choices are:
> 1. Decommission the total datanode.  If the datanode is carrying 6 or 12 
> disks of data, especially on a cluster that is smallish -- 5 to 20 nodes -- 
> the rereplication of the downed datanode's data can be pretty disruptive, 
> especially if the cluster is doing low latency serving: e.g. hosting an hbase 
> cluster.
> 2. Stop the datanode, unmount the bad disk, and restart the datanode (You 
> can't unmount the disk while it is in use).  This latter is better in that 
> only the bad disk's data is rereplicated, not all datanode data.
> Is it possible to do better, say, send the datanode a signal to tell it stop 
> using a disk an operator has designated 'bad'.  This would be like option #2 
> above minus the need to stop and restart the datanode.  Ideally the disk 
> would become unmountable after a while.
> Nice to have would be being able to tell the datanode to restart using a disk 
> after its been replaced.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4239) Means of telling the datanode to stop using a sick disk

Reply via email to