[ 
https://issues.apache.org/jira/browse/HDFS-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Hong Wang updated HDFS-4815:
---------------------------------

    Description: 
In TestRBWBlockInvalidation, the original code is:
while (!isCorruptReported) {
        if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
          isCorruptReported = true;
        }
        Thread.sleep(100);
}
assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
          countReplicas(namesystem, blk).corruptReplicas());

Once the program detects there exists one corruptReplica, it will break the 
while loop. After that, it call countReplicas() again in assertEquals(). But 
sometimes I met the following error:
java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap 
expected:<1> but was:<0>

It's obviously that second function call countReplicas() in assertEquals(), the 
corruptReplicas value has been changed since program go to sleep and 
BlockManger recovered the corrupt block during this sleep time.  

So what I do is:
1) once detecting there exists one corruptReplica, break the loop and don't 
call sleep(), the same as liveReplicas
2) don't double check the countReplicas & liveReplicas in assertEquals()
3) sometimes I meet the problem of testcase timeout, so I speed up the block 
report interval


  was:
In TestRBWBlockInvalidation, the original code is:
while (!isCorruptReported) {
        if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
          isCorruptReported = true;
        }
        Thread.sleep(100);
}
assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
          countReplicas(namesystem, blk).corruptReplicas());

Once the program detects there exists one corruptReplica, it will break the 
while loop. After that, it call countReplicas() again in assertEquals(). But 
sometimes I met the following error:
java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap 
expected:<1> but was:<0>

It's obviously that second function call countReplicas() in assertEquals(), the 
corruptReplicas value has been changed since program go to sleep and 
BlockManger recovered the corrupt block during this sleep time.  

So what I do is:
1) once detecting there exists one corruptReplica, break the loop and don't 
call sleep(), the same as liveReplicas
2) don't double check the countReplicas & liveReplicas in assertEquals()
3) sometime I meet the problem of testcase timeout, so I speed up the block 
report interval


    
> Double call countReplicas() to fetch corruptReplicas and liveReplicas is not 
> needed
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-4815
>                 URL: https://issues.apache.org/jira/browse/HDFS-4815
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Tian Hong Wang
>            Assignee: Tian Hong Wang
>              Labels: patch
>         Attachments: HDFS-4815.patch
>
>
> In TestRBWBlockInvalidation, the original code is:
> while (!isCorruptReported) {
>         if (countReplicas(namesystem, blk).corruptReplicas() > 0) {
>           isCorruptReported = true;
>         }
>         Thread.sleep(100);
> }
> assertEquals("There should be 1 replica in the corruptReplicasMap", 1,
>           countReplicas(namesystem, blk).corruptReplicas());
> Once the program detects there exists one corruptReplica, it will break the 
> while loop. After that, it call countReplicas() again in assertEquals(). But 
> sometimes I met the following error:
> java.lang.AssertionError: There should be 1 replica in the corruptReplicasMap 
> expected:<1> but was:<0>
> It's obviously that second function call countReplicas() in assertEquals(), 
> the corruptReplicas value has been changed since program go to sleep and 
> BlockManger recovered the corrupt block during this sleep time.  
> So what I do is:
> 1) once detecting there exists one corruptReplica, break the loop and don't 
> call sleep(), the same as liveReplicas
> 2) don't double check the countReplicas & liveReplicas in assertEquals()
> 3) sometimes I meet the problem of testcase timeout, so I speed up the block 
> report interval

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to