stack created HDFS-4239:
---------------------------

             Summary: Means of telling the datanode to stop using a sick disk
                 Key: HDFS-4239
                 URL: https://issues.apache.org/jira/browse/HDFS-4239
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: stack


If a disk has been deemed 'sick' -- i.e. not dead but wounded, failing 
occasionally, or just exhibiting high latency -- your choices are:

1. Decommission the total datanode.  If the datanode is carrying 6 or 12 disks 
of data, especially on a cluster that is smallish -- 5 to 20 nodes -- the 
rereplication of the downed datanode's data can be pretty disruptive, 
especially if the cluster is doing low latency serving: e.g. hosting an hbase 
cluster.

2. Stop the datanode, unmount the bad disk, and restart the datanode (You can't 
unmount the disk while it is in use).  This latter is better in that only the 
bad disk's data is rereplicated, not all datanode data.

Is it possible to do better, say, send the datanode a signal to tell it stop 
using a disk an operator has designated 'bad'.  This would be like option #2 
above minus the need to stop and restart the datanode.  Ideally the disk would 
become unmountable after a while.

Nice to have would be being able to tell the datanode to restart using a disk 
after its been replaced.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to