[
https://issues.apache.org/jira/browse/HDFS-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187170#comment-13187170
]
Todd Lipcon commented on HDFS-2798:
---
You can reproduce this fairly reliably by adding a sleep call in BlockReceiver
as follows:
{code}
case PIPELINE_SETUP_APPEND:
replicaInfo = datanode.data.append(block, newGs, minBytesRcvd);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
if (datanode.blockScanner != null) { // remove from block scanner
datanode.blockScanner.deleteBlock(block.getBlockPoolId(),
block.getLocalBlock());
}
block.setGenerationStamp(newGs);
break;
{code}
TestFileAppend2 will then time out.
> Append may race with datanode block scanner, causing replica to be
> incorrectly marked corrupt
> -
>
> Key: HDFS-2798
> URL: https://issues.apache.org/jira/browse/HDFS-2798
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> When a pipeline is setup for append, the block's metadata file is renamed
> before the block is removed from the datanode block scanner queues. This can
> cause a race condition where the block scanner incorrectly marks the block as
> corrupt, since it tries to scan the file corresponding to the old genstamp.
> This causes TestFileAppend2 to time out in extremely rare circumstances - the
> corrupt replica prevents the writer thread from completing the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira