[ https://issues.apache.org/jira/browse/HDFS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Li Bo updated HDFS-9826: ------------------------ Description: Currently NameNode prepares recovering when finding an under replicated block group. This is inefficient and reduces resources for other operations. It would be better to postpone the recovery work for a period of time if only one internal block is corrupted considering points shown by papers such as \[1\]\[2\]: 1. Transient errors in which no data are lost account for more than 90% of data center failures, owing to network partitions, software problems, or non-disk hardware faults. 2. Although erasure codes tolerate multiple simultaneous failures, single failures represent 99.75% of recoveries. Different clusters may have different status, so we should allow user to configure the time for postponing the recoveries. Proper configuration will reduce a large proportion of unnecessary recoveries. When finding multiple internal blocks corrupted in a block group, we prepare the recovery work immediately because it’s very rare and we don’t want to increase the risk of losing data. [1] Availability in globally distributed storage systems http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf [2] Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf was: Currently NameNode prepares recovering when finding an under replicated block group. This is inefficient and reduces resources for other operations. It would be better to postpone the recovery work for a period of time if only one internal block is corrupted considering points shown by papers such as \[1\]\[2\]: 1. Transient errors in which no data are lost account for more than 90% of data center failures, owing to network partitions, software problems, or non-disk hardware faults. 2. Although erasure codes tolerate multiple simultaneous failures, single failures represent 99.75% of recoveries. Different clusters may have different status, so we should allow user to configure the time for postponing the recoveries. Proper configuration will reduce a large proportion of unnecessary recoveries. When finding multiple internal blocks corrupted in a block group, we do the recovery work immediately because it’s very rare and we don’t want to increase the risk of losing data. [1] Availability in globally distributed storage systems http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf [2] Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf > Erasure Coding: Postpone the recovery work for a configurable time period > -------------------------------------------------------------------------- > > Key: HDFS-9826 > URL: https://issues.apache.org/jira/browse/HDFS-9826 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Li Bo > Assignee: Li Bo > > Currently NameNode prepares recovering when finding an under replicated > block group. This is inefficient and reduces resources for other operations. > It would be better to postpone the recovery work for a period of time if only > one internal block is corrupted considering points shown by papers such as > \[1\]\[2\]: > 1. Transient errors in which no data are lost account for more than 90% of > data center failures, owing to network partitions, software problems, or > non-disk hardware faults. > 2. Although erasure codes tolerate multiple simultaneous failures, single > failures represent 99.75% of recoveries. > Different clusters may have different status, so we should allow user to > configure the time for postponing the recoveries. Proper configuration will > reduce a large proportion of unnecessary recoveries. When finding multiple > internal blocks corrupted in a block group, we prepare the recovery work > immediately because it’s very rare and we don’t want to increase the risk of > losing data. > [1] Availability in globally distributed storage systems > http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf > [2] Rethinking erasure codes for cloud file systems: minimizing I/O for > recovery and degraded reads > http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)