[ https://issues.apache.org/jira/browse/HADOOP-15450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arpit Agarwal updated HADOOP-15450: ----------------------------------- Description: Fix disk checker issues reported by [~kihwal] in HADOOP-13738: # When space is low, the os returns ENOSPC. Instead simply stop writing, the drive is marked bad and replication happens. This make cluster-wide space problem worse. If the number of "failed" drives exceeds the DFIP limit, the datanode shuts down. # There are non-hdfs users of DiskChecker, who use it proactively, not just on failures. This was fine before, but now it incurs heavy I/O due to introduction of fsync() in the code. was: Fix disk checker issues reported by [~kihwal] in HADOOP-13738: 1. When space is low, the os returns ENOSPC. Instead simply stop writing, the drive is marked bad and replication happens. This make cluster-wide space problem worse. If the number of "failed" drives exceeds the DFIP limit, the datanode shuts down. 1. There are non-hdfs users of DiskChecker, who use it proactively, not just on failures. This was fine before, but now it incurs heavy I/O due to introduction of fsync() in the code. > Avoid fsync storm triggered by DiskChecker and handle disk full situation > ------------------------------------------------------------------------- > > Key: HADOOP-15450 > URL: https://issues.apache.org/jira/browse/HADOOP-15450 > Project: Hadoop Common > Issue Type: Bug > Reporter: Arpit Agarwal > Assignee: Arpit Agarwal > Priority: Major > > Fix disk checker issues reported by [~kihwal] in HADOOP-13738: > # When space is low, the os returns ENOSPC. Instead simply stop writing, the > drive is marked bad and replication happens. This make cluster-wide space > problem worse. If the number of "failed" drives exceeds the DFIP limit, the > datanode shuts down. > # There are non-hdfs users of DiskChecker, who use it proactively, not just > on failures. This was fine before, but now it incurs heavy I/O due to > introduction of fsync() in the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org