[ 
https://issues.apache.org/jira/browse/HBASE-15192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126599#comment-15126599
 ] 

Ted Yu commented on HBASE-15192:
--------------------------------

Since the test fails if merge references are not cleaned, we can call 
admin.runCatalogScan() more than once if needed.

runCatalogScan() is the only method exposed by CatalogJanitor, otherwise we can 
poll CatalogJanitor for the value of mergeCleaned and pass the test when 
mergeCleaned crosses 1.

Patch v2 passes 30 iterations of test runs. Previously the test failed within 
the first 5 iterations.

> TestRegionMergeTransactionOnCluster#testCleanMergeReference is flaky
> --------------------------------------------------------------------
>
>                 Key: HBASE-15192
>                 URL: https://issues.apache.org/jira/browse/HBASE-15192
>             Project: HBase
>          Issue Type: Test
>            Reporter: Ted Yu
>            Assignee: Ted Yu
>            Priority: Minor
>         Attachments: HBASE-15192.v1.patch
>
>
> TestRegionMergeTransactionOnCluster#testCleanMergeReference fails 
> intermittently due to failed assertion on cleaned merge region count:
> {code}
> testCleanMergeReference(org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster)
>   Time elapsed: 64.183 sec  <<< FAILURE!
> java.lang.AssertionError: null
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hbase.regionserver.TestRegionMergeTransactionOnCluster.testCleanMergeReference(TestRegionMergeTransactionOnCluster.java:284)
> {code}
> Before calling CatalogJanitor#scan(), the test does:
> {code}
>       int newcount1 = 0;
>       while (System.currentTimeMillis() < timeout) {
>         for(HColumnDescriptor colFamily : columnFamilies) {
>           newcount1 += hrfs.getStoreFiles(colFamily.getName()).size();
>         }
>         if(newcount1 <= 1) {
>           break;
>         }
>         Thread.sleep(50);
>       }
> {code}
> newcount1 is not cleared at the beginning of the loop.
> This means that if the check for newcount1 <= 1 doesn't pass the first 
> iteration, it wouldn't pass in subsequent iterations.
> After timeout is exhausted, admin.runCatalogScan() is called. However, there 
> is a chance that CatalogJanitor#scan() has been called by the Chore already 
> (during the wait period), leaving the cleaned count 0 and failing the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to