[ 
https://issues.apache.org/jira/browse/HDFS-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377268#comment-15377268
 ] 

Wei-Chiu Chuang commented on HDFS-10627:
----------------------------------------

[~shahrs87] this piece of code can be useful during block transfer due to prior 
pipeline recovery.
In which case, if the receiver detects corruption in the replica, it 
immediately resets connection.

If block transfer is not initiated due to pipeline recovery, the receiver also 
notifies NameNode that the source's replica is corrupt (this is actually not 
accurate because the corruption may due to other issues, and which causes the 
bug described in HDFS-6804).

In short, I think this code is still necessary, because of the lack of feedback 
mechanism in block transfer during pipeline recovery.

> Volume Scanner mark a block as "suspect" even if the block sender encounters 
> 'Broken pipe' or 'Connection reset by peer' exception
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10627
>                 URL: https://issues.apache.org/jira/browse/HDFS-10627
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 2.7.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>
> In the BlockSender code,
> {code:title=BlockSender.java|borderStyle=solid}
>         if (!ioem.startsWith("Broken pipe") && !ioem.startsWith("Connection 
> reset")) {
>           LOG.error("BlockSender.sendChunks() exception: ", e);
>         }
>         datanode.getBlockScanner().markSuspectBlock(
>               volumeRef.getVolume().getStorageID(),
>               block);
> {code}
> Before HDFS-7686, the block was marked as suspect only if the exception 
> message doesn't start with Broken pipe or Connection reset.
> But after HDFS-7686, the block is marked as corrupt irrespective of the 
> exception message.
> In one of our datanode, it took approximately a whole day (22 hours) to go 
> through all the suspect blocks to scan one corrupt block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to