[ https://issues.apache.org/jira/browse/HDFS-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13022799#comment-13022799 ]
Koji Noguchi commented on HDFS-1848: ------------------------------------ > In our tests with disk failures, We have verified that if the root/critical > volume fails, Datanode can't even start. > That is the problem. Even when these critical volumes become bad, datanodes would keep on running but fail at restart. With clusters running for months, we can have multiple datanodes refusing to restart leading to the data loss. > Datanodes should shutdown when a critical volume fails > ------------------------------------------------------ > > Key: HDFS-1848 > URL: https://issues.apache.org/jira/browse/HDFS-1848 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node > Reporter: Eli Collins > Fix For: 0.23.0 > > > A DN should shutdown when a critical volume (eg the volume that hosts the OS, > logs, pid, tmp dir etc.) fails. The admin should be able to specify which > volumes are critical, eg they might specify the volume that lives on the boot > disk. A failure in one of these volumes would not be subject to the threshold > (HDFS-1161) or result in host decommissioning (HDFS-1847) as the > decommissioning process would likely fail. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira