[ 
https://issues.apache.org/jira/browse/HDFS-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052397#comment-15052397
 ] 

Mingliang Liu commented on HDFS-9535:
-------------------------------------

Thanks for the insightful discussion.

When client closes the file, it happens that the last block is committed while 
no live replicas reported yet. In this case, the {{hasMinStorage()}} is false 
and thus the last block is not added to pending replicas. When one IBR is later 
received, the last block is completed (via {{addStoredBlock()}}). Next time the 
client retries to complete the file, the {{commitOrCompleteLastBlock()}} simply 
returns false (see beginning of the code snippet), instead of completing it 
again. As the code brought by [HDFS-1172] is not really called, it fails to 
stop the replication work from being scheduled. The unit test fails in this 
case.
{code:title=code snippet of BlockManager#commitOrCompleteLastBlock()}
    if(lastBlock.isComplete())
      return false; // already completed (e.g. by syncBlock)

    final boolean b = commitBlock(lastBlock, commitBlock);
    if (hasMinStorage(lastBlock)) {
      if (b && !bc.isStriped()) {
        addExpectedReplicasToPending(lastBlock);
      }
      completeBlock(lastBlock, false);
    }
{code}

I think we should correct the unit test before changing any logic in the 
{{commitOrCompleteLastBlock}}.

> Fix TestReplication#testNoExtraReplicationWhenBlockReceivedIsLate
> -----------------------------------------------------------------
>
>                 Key: HDFS-9535
>                 URL: https://issues.apache.org/jira/browse/HDFS-9535
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Jing Zhao
>            Assignee: Mingliang Liu
>            Priority: Minor
>
> TestReplication#testNoExtraReplicationWhenBlockReceivedIsLate failed in 
> several Jenkins run (e.g., 
> https://builds.apache.org/job/PreCommit-HDFS-Build/13818/testReport/). The 
> failure is on the last {{assertNoReplicationWasPerformed}} check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to