[ https://issues.apache.org/jira/browse/HDFS-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910906#comment-16910906 ]
Chen Zhang commented on HDFS-13709: ----------------------------------- Thanks [~jojochuang] for reviewing this patch and merging it. I'll provide a branch-2 patch later, btw, I've a few questions about this: # In which case we need to backport the patch to branch-2? Usually the bugfix and some critical improvements? # Some people open a new Jira to backport to branch-2, some update a new patch in the same Jira, which is better in the practice? > Report bad block to NN when transfer block encounter EIO exception > ------------------------------------------------------------------ > > Key: HDFS-13709 > URL: https://issues.apache.org/jira/browse/HDFS-13709 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Reporter: Chen Zhang > Assignee: Chen Zhang > Priority: Major > Fix For: 3.3.0, 3.2.1, 3.1.3 > > Attachments: HDFS-13709.002.patch, HDFS-13709.003.patch, > HDFS-13709.004.patch, HDFS-13709.005.patch, HDFS-13709.patch > > > In our online cluster, the BlockPoolSliceScanner is turned off, and sometimes > disk bad track may cause data loss. > For example, there are 3 replicas on 3 machines A/B/C, if a bad track occurs > on A's replica data, and someday B and C crushed at the same time, NN will > try to replicate data from A but failed, this block is corrupt now but no one > knows, because NN think there is at least 1 healthy replica and it keep > trying to replicate it. > When reading a replica which have data on bad track, OS will return an EIO > error, if DN reports the bad block as soon as it got an EIO, we can find > this case ASAP and try to avoid data loss -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org