[ 
https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360903#comment-14360903
 ] 

Konstantin Shvachko commented on HDFS-7886:
-------------------------------------------

Hey guys, thanks for the reviews.
Cannot remove {{triggerBlockReports()}}, though. Just reran on my laptop 
without triggering and it failed. This is actually where I spent most of the 
time. The problem is in {{commitBlockSync()}}, which I was about to file a jira 
for, but will explain here first.
In short {{commitBlockSync()}} does not remove locations from the block, which 
were not confirmed, that is not reducing them to {{newTargets}}.
In the test truncate recovery is happening _while_ DNs are restarting.  If 
recovery is handled _after_ the initial block reports from restarting DNs, the 
recovery will have only one new target, the node that was not restarted, but  
{{commitBlockSync()}} will not remove the other two. So {{waitReplication()}} 
will incorrectly show 3 replicas, but {{cluster.getBlockFile().length}} on the 
restarted node will the old length 4, while it should be 3. So I had to trigger 
block reports after the recovery, which  removes the two invalid replicas from 
NN, then replication is triggered, and the test passes.
Now, if truncate recovery happens _before_ the initial block reports from 
restarting nodes, then everything is fine and {{triggerBlockReports()}} is 
redundant.  When you see TestFileTruncate succeeds, look for block 
{{blk_1073742100}}, you should see that {{initReplicaRecovery}} for it is 
happening before {{processReport}} and succeeds on all three nodes. While in 
the failure case {{initReplicaRecovery}} throws exceptions on two DNs out of 
three. 

> TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes
> ------------------------------------------------------------------------
>
>                 Key: HDFS-7886
>                 URL: https://issues.apache.org/jira/browse/HDFS-7886
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.7.0
>            Reporter: Yi Liu
>            Assignee: Plamen Jeliazkov
>            Priority: Minor
>         Attachments: HDFS-7886-01.patch, HDFS-7886.patch
>
>
> https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to