[ https://issues.apache.org/jira/browse/HDFS-9826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wang updated HDFS-9826: ------------------------------ Resolution: Not A Problem Status: Resolved (was: Patch Available) Resolving per above, feel free to reopen if you'd like to resume this work. > Erasure Coding: Postpone the recovery work for a configurable time period > -------------------------------------------------------------------------- > > Key: HDFS-9826 > URL: https://issues.apache.org/jira/browse/HDFS-9826 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Li Bo > Assignee: Li Bo > Attachments: HDFS-9826-001.patch, HDFS-9826-002.patch > > > Currently NameNode prepares recovering when finding an under replicated > block group. This is inefficient and reduces resources for other operations. > It would be better to postpone the recovery work for a period of time if only > one internal block is corrupted considering points shown by papers such as > \[1\]\[2\]: > 1. Transient errors in which no data are lost account for more than 90% of > data center failures, owing to network partitions, software problems, or > non-disk hardware faults. > 2. Although erasure codes tolerate multiple simultaneous failures, single > failures represent 99.75% of recoveries. > Different clusters may have different status, so we should allow user to > configure the time for postponing the recoveries. Proper configuration will > reduce a large proportion of unnecessary recoveries. When finding multiple > internal blocks corrupted in a block group, we prepare the recovery work > immediately because it’s very rare and we don’t want to increase the risk of > losing data. > [1] Availability in globally distributed storage systems > http://static.usenix.org/events/osdi10/tech/full_papers/Ford.pdf > [2] Rethinking erasure codes for cloud file systems: minimizing I/O for > recovery and degraded reads > http://static.usenix.org/events/fast/tech/full_papers/Khan.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org