[ https://issues.apache.org/jira/browse/HDFS-10714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403322#comment-15403322 ]
Brahma Reddy Battula commented on HDFS-10714: --------------------------------------------- Thinking solutions like this. 1) Remove both DNs in checksum error case..i.e DN2 and DN3 2) Remove DN3 first and record DN2 as suspect node .. If it still fails with checksum error , then DN2 can be removed as it's suspected during next pipeline I think, 2nd solution will be safe.. anythoughts on this...? cc [~kanaka]/[~vinayrpet] > Issue in handling checksum errors in write pipeline when fault DN is > LAST_IN_PIPELINE > ------------------------------------------------------------------------------------- > > Key: HDFS-10714 > URL: https://issues.apache.org/jira/browse/HDFS-10714 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Brahma Reddy Battula > Assignee: Brahma Reddy Battula > > We had come across one issue, where write is failed even 7 DN’s are available > due to network fault at one datanode which is LAST_IN_PIPELINE. It will be > similar to HDFS-6937 . > Scenario : (DN3 has N/W Fault and Min repl=2). > Write pipeline: > DN1->DN2->DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN2 marked as bad > DN1->DN4-> DN3 => DN3 Gives ERROR_CHECKSUM ack. And so DN4 is marked as bad > …. > And so on ( all the times DN3 is LAST_IN_PIPELINE) ... Continued till no more > datanodes to construct the pipeline. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org