star created HDFS-15299:
---------------------------

             Summary: Add an option to enable  reporting&removing flaky disk 
                 Key: HDFS-15299
                 URL: https://issues.apache.org/jira/browse/HDFS-15299
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: star
            Assignee: star


In our production environment with disks more than 8 years old, many DN are 
treated as dead   because of partially broken. Then NN will balance data blocks 
in the cluster, introducing high disk loads. To reduce the impact of flaky 
disks, we'd like to extend the tolerance mechanism to partial disk failure. 

       As described in HDFS-10777  , command du could still throw exception in 
a high loaded disk. It is brittle to just remove a flaky disk because it may 
recover later. However it is a rare case in our production environment. So can 
we just add an option to enable partial disk failure tolerance for users who 
has mostly broken disks and care more about stability of the cluster.

      We will replace those old disks in the future, but before that, it will 
last a long time to run hdfs cluster on those servers.

      Comments are appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to