[ https://issues.apache.org/jira/browse/HBASE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13542726#comment-13542726 ]
chunhui shen commented on HBASE-7299: ------------------------------------- I have analysed the logs of trunk build #3686, and found the reason. 1.We will abort the regionserver 0 in both testBatchWithPut and testFlushCommitsWithAbort 2.We will ensure 2 regionservers alisve before each test {code} @Before public void before() throws IOException { LOG.info("before"); if (UTIL.ensureSomeRegionServersAvailable(slaves)) { // Distribute regions UTIL.getMiniHBaseCluster().getMaster().balance(); } LOG.info("before done"); } {code} 3.In trunk build #3686, testFlushCommitsWithAbort is run after testBatchWithPut {code} 2013-01-02 12:28:33,183 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testBatchWithPut ... 2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsWithAbort {code} 4.testFlushCommitsWithAbort abort the regionserver 0 which is already aborted by testBatchWithPut, so we see the following log: {code} 2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsWithAbort 2013-01-02 12:30:08,410 INFO [pool-1-thread-1] client.TestMultiParallel(77): before 2013-01-02 12:30:08,410 INFO [pool-1-thread-1] hbase.LocalHBaseCluster(243): Not alive RegionServer:0;juno.apache.org,40265,1357129678691 2013-01-02 12:30:08,410 INFO [pool-1-thread-1] client.TestMultiParallel(82): before done 2013-01-02 12:30:08,410 INFO [Thread-709] client.TestMultiParallel(226): test=testFlushCommitsWithAbort ... 2013-01-02 12:30:09,059 INFO [Thread-709] hbase.LocalHBaseCluster(243): Not alive RegionServer:0;juno.apache.org,40265,1357129678691 2013-01-02 12:30:09,059 INFO [Thread-709] client.TestMultiParallel(277): Count=1, Alive=juno.apache.org,40198,1357129678744 2013-01-02 12:30:09,059 INFO [Thread-709] client.TestMultiParallel(277): Count=2, Alive=juno.apache.org,51431,1357129753348 {code} 5.From the above, it's clear there are total 3 regionservers and 2 are alive, but testFlushCommitsWithAbort consider only total 2 regionserver Uploading the addendum2 to fix the case bug > TestMultiParallel fails intermittently in trunk builds > ------------------------------------------------------ > > Key: HBASE-7299 > URL: https://issues.apache.org/jira/browse/HBASE-7299 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: chunhui shen > Priority: Critical > Fix For: 0.96.0 > > Attachments: 7299.addendum, 7299-v4.txt, HBASE-7299.patch, > HBASE-7299v2.patch, HBASE-7299v3.patch > > > From trunk build #3598: > {code} > testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel): > Count of regions=8 > {code} > It failed in 3595 as well: > {code} > java.lang.AssertionError: Server count=2, abort=true expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:267) > at > org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsWithAbort(TestMultiParallel.java:226) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira