[ https://issues.apache.org/jira/browse/HDFS-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360903#comment-14360903 ]
Konstantin Shvachko commented on HDFS-7886: ------------------------------------------- Hey guys, thanks for the reviews. Cannot remove {{triggerBlockReports()}}, though. Just reran on my laptop without triggering and it failed. This is actually where I spent most of the time. The problem is in {{commitBlockSync()}}, which I was about to file a jira for, but will explain here first. In short {{commitBlockSync()}} does not remove locations from the block, which were not confirmed, that is not reducing them to {{newTargets}}. In the test truncate recovery is happening _while_ DNs are restarting. If recovery is handled _after_ the initial block reports from restarting DNs, the recovery will have only one new target, the node that was not restarted, but {{commitBlockSync()}} will not remove the other two. So {{waitReplication()}} will incorrectly show 3 replicas, but {{cluster.getBlockFile().length}} on the restarted node will the old length 4, while it should be 3. So I had to trigger block reports after the recovery, which removes the two invalid replicas from NN, then replication is triggered, and the test passes. Now, if truncate recovery happens _before_ the initial block reports from restarting nodes, then everything is fine and {{triggerBlockReports()}} is redundant. When you see TestFileTruncate succeeds, look for block {{blk_1073742100}}, you should see that {{initReplicaRecovery}} for it is happening before {{processReport}} and succeeds on all three nodes. While in the failure case {{initReplicaRecovery}} throws exceptions on two DNs out of three. > TestFileTruncate#testTruncateWithDataNodesRestart runs timeout sometimes > ------------------------------------------------------------------------ > > Key: HDFS-7886 > URL: https://issues.apache.org/jira/browse/HDFS-7886 > Project: Hadoop HDFS > Issue Type: Bug > Components: test > Affects Versions: 2.7.0 > Reporter: Yi Liu > Assignee: Plamen Jeliazkov > Priority: Minor > Attachments: HDFS-7886-01.patch, HDFS-7886.patch > > > https://builds.apache.org/job/PreCommit-HDFS-Build/9730//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)