[ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125490#comment-14125490 ]
Srikanth Upputuri commented on HDFS-2932: ----------------------------------------- Further analysing the two cases detailed by Vinay: *Case 1*. I think the fix given for HDFS-3493 will solve this case as the corrupt replica(result of pipeline failure) will be eventually invalidated, inspite of the fact that total replicas = replication factor. Please confirm. *Case 2*. If a write-pipeline-failed replica from a restarted DN arrives before the stored block is 'completed', it will not be marked as corrupt. Later when NN computes the replication work it is not aware of the fact that a corrupt replica exists on DN3, so it will keep scheduling replication from say DN2 to DN3 without success till next block report from DN3 is processed. {code} //BlockManager#checkReplicaCorrupt case RBW: case RWR: if (!storedBlock.isComplete()) { return null; // not corrupt } {code} There are two exclusive time windows when such a replica can be reported. DN restarts and replica is reported before the client finished writing the block, i.e the block is not 'committed'. DN restarts and replica is reported after 'commit' but before 'complete'. Solution is to be able to detect and capture a write-pipeline-failed replica as early as possible. First fix may be to change the check from 'isCompleted' to 'isCommitted'. This will capture write-pipeline-failed replicas reported just after commit and before 'complete' and mark them as corrupt. Then to capture write-pipeline-failed replicas reported before commit, I am investigating if this can be solved by marking them as corrupt as part of commit. There already exists a check to find any mis-stamped replicas during commit but we only remove them from the blocksMap. In addition can we not mark such replicas as corrupt? {code} //BlockInfoUnderConstruction#commitBlock // Sort out invalid replicas. setGenerationStampAndVerifyReplicas(block.getGenerationStamp()); {code} Any thoughts/suggestions? > Under replicated block after the pipeline recovery. > --------------------------------------------------- > > Key: HDFS-2932 > URL: https://issues.apache.org/jira/browse/HDFS-2932 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Affects Versions: 0.24.0 > Reporter: J.Andreina > Fix For: 0.24.0 > > > Started 1NN,DN1,DN2,DN3 in the same machine. > Written a huge file of size 2 Gb > while the write for the block-id-1005 is in progress bruought down DN3. > after the pipeline recovery happened.Block stamp changed into block_id_1006 > in DN1,Dn2. > after the write is over.DN3 is brought up and fsck command is issued. > the following mess is displayed as follows > "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas". -- This message was sent by Atlassian JIRA (v6.3.4#6332)