[jira] [Commented] (HDFS-2798) Append may race with datanode block scanner, causing replica to be incorrectly marked corrupt

2012-09-25 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13463380#comment-13463380
 ] 

Brandon Li commented on HDFS-2798:
--

I think this problem has been fixed by HDFS-2525. 
Also tried TestFileAppend2 with adding sleep call in BlockReceiver, it passed.

> Append may race with datanode block scanner, causing replica to be 
> incorrectly marked corrupt
> -
>
> Key: HDFS-2798
> URL: https://issues.apache.org/jira/browse/HDFS-2798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Assignee: Brandon Li
>Priority: Critical
>
> When a pipeline is setup for append, the block's metadata file is renamed 
> before the block is removed from the datanode block scanner queues. This can 
> cause a race condition where the block scanner incorrectly marks the block as 
> corrupt, since it tries to scan the file corresponding to the old genstamp.
> This causes TestFileAppend2 to time out in extremely rare circumstances - the 
> corrupt replica prevents the writer thread from completing the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2798) Append may race with datanode block scanner, causing replica to be incorrectly marked corrupt

2012-01-16 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13187170#comment-13187170
 ] 

Todd Lipcon commented on HDFS-2798:
---

You can reproduce this fairly reliably by adding a sleep call in BlockReceiver 
as follows:

{code}

case PIPELINE_SETUP_APPEND:
  replicaInfo = datanode.data.append(block, newGs, minBytesRcvd);
  try {
Thread.sleep(1000);
  } catch (InterruptedException e) {
Thread.currentThread().interrupt();
  }
  if (datanode.blockScanner != null) { // remove from block scanner
datanode.blockScanner.deleteBlock(block.getBlockPoolId(),
block.getLocalBlock());
  }
  block.setGenerationStamp(newGs);
  break;
{code}
TestFileAppend2 will then time out.

> Append may race with datanode block scanner, causing replica to be 
> incorrectly marked corrupt
> -
>
> Key: HDFS-2798
> URL: https://issues.apache.org/jira/browse/HDFS-2798
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Todd Lipcon
>Priority: Critical
>
> When a pipeline is setup for append, the block's metadata file is renamed 
> before the block is removed from the datanode block scanner queues. This can 
> cause a race condition where the block scanner incorrectly marks the block as 
> corrupt, since it tries to scan the file corresponding to the old genstamp.
> This causes TestFileAppend2 to time out in extremely rare circumstances - the 
> corrupt replica prevents the writer thread from completing the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira