[ https://issues.apache.org/jira/browse/HBASE-7299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13541312#comment-13541312 ]
chunhui shen commented on HBASE-7299: ------------------------------------- [~ted_yu] I have see the log again. And I think it's because of balance First, see the order of test: {code} 2012-12-31 03:11:48,688 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testActiveThreadsCount 2012-12-31 03:11:49,247 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testBatchWithGet 2012-12-31 03:11:50,151 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testBadFam 2012-12-31 03:11:50,169 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsNoAbort 2012-12-31 03:11:50,825 INFO [pool-1-thread-1] hbase.ResourceChecker(147): before: client.TestMultiParallel#testFlushCommitsWithAbort {code} Therefore, We only need to take care what happen before 2012-12-31 03:11:50,825 Then, I grep all the opened region logs {code} 2012-12-31 03:11:46,309 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0] handler.OpenRegionHandler(149): Opened multi_test_table,,1356923505778.5e876dba9be19501a1eb65bf3a169e52. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,164 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,bbb,1356923506859.7c3f09396e7314de6f5a757b010b6497. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,202 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0] handler.OpenRegionHandler(149): Opened multi_test_table,ccc,1356923506862.2a80b82e2d6c3152e3f12bc91e1cc621. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,303 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-1] handler.OpenRegionHandler(149): Opened multi_test_table,fff,1356923506868.63ffa8986cd30ff5314b4c2a70cf846a. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,329 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,ddd,1356923506864.744510f09d963e39dd9c0b6e3119dc10. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,370 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-2] handler.OpenRegionHandler(149): Opened multi_test_table,iii,1356923506875.d09ca7b9b80b6cde560772598a240d0e. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,400 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0] handler.OpenRegionHandler(149): Opened multi_test_table,eee,1356923506866.6a1697e740f121d009c3085e0cccd18d. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,439 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0] handler.OpenRegionHandler(149): Opened multi_test_table,jjj,1356923506878.f25b9086263fb7a4f983524c708503b6. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,465 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1] handler.OpenRegionHandler(149): Opened multi_test_table,,1356923506856.2db538d9e2005dba4e28746d51cf3831. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,482 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,ggg,1356923506871.7adeba3045bdbb0f4e499b221d2ffc87. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,598 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-1] handler.OpenRegionHandler(149): Opened multi_test_table,nnn,1356923506888.9cc1e013ebfba7da8e00e4963c2d111a. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,603 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1] handler.OpenRegionHandler(149): Opened multi_test_table,kkk,1356923506880.a2a3e39af3fa95eb1a3979998b075bb6. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,634 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-2] handler.OpenRegionHandler(149): Opened multi_test_table,ppp,1356923506893.915969809cfe733d325591b7c27bd088. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,643 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,lll,1356923506883.0e6b1c9b373cecb0c74380b78d1cc492. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,701 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0] handler.OpenRegionHandler(149): Opened multi_test_table,rrr,1356923506899.67925003b24f6408e7ee6ef2360a77f6. on server:asf001.sp2.ygridcore.net,45800,1356923500558 2012-12-31 03:11:47,717 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1] handler.OpenRegionHandler(149): Opened multi_test_table,mmm,1356923506886.d0a07239a287e74e7706e4b9a0c9f491. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,745 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,ooo,1356923506891.524c6a4fb529fbb5b86e0865ac0131f5. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,867 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,sss,1356923506901.af5693d7dc46541210d7c26cf4e4c1a0. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,936 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0] handler.OpenRegionHandler(149): Opened multi_test_table,hhh,1356923506873.12dff64cde2a448c9d5b7adecfabfaaa. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:47,957 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,ttt,1356923506904.c121cfbfb3e248f820d4729e4452ff14. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,012 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,vvv,1356923506908.797a80f1a86a9256a833e4cd48554185. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,076 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,www,1356923506911.79129a00e6718ae7ca478e3dde854524. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,185 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1] handler.OpenRegionHandler(149): Opened multi_test_table,qqq,1356923506896.cf9a88d3961133afeaaeabdf5a9cffc3. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,411 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-0] handler.OpenRegionHandler(149): Opened multi_test_table,uuu,1356923506906.62d60488f81f0e0edce10369200b1543. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,556 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-2] handler.OpenRegionHandler(149): Opened multi_test_table,xxx,1356923506913.bf54cd9fae68060237f700e0c7acc6b4. on server:asf001.sp2.ygridcore.net,38198,1356923500609 2012-12-31 03:11:48,626 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,38198,1356923500609-1] handler.OpenRegionHandler(149): Opened multi_test_table,yyy,1356923506915.7b41bd006a2c832842a67b85f1837c68. on server:asf001.sp2.ygridcore.net,38198,1356923500609 {code} These regions are created by {code} @BeforeClass public static void beforeClass() throws Exception { ... UTIL.createMultiRegions(t, Bytes.toBytes(FAMILY)); ... } {code} >From the above, we could see >server:asf001.sp2.ygridcore.net,38198,1356923500609 serve 20 regions, and >asf001.sp2.ygridcore.net,45800,1356923500558 only serve 6 regions. It seems clear: {code} for (JVMClusterUtil.RegionServerThread t: liveRSs) { int regions = ProtobufUtil.getOnlineRegions(t.getRegionServer()).size(); Assert.assertTrue("Count of regions=" + regions, regions > 10); } {code} I don't know why we assert regions more than 10 for each regionserver. >From the failed logs, "java.lang.AssertionError: Count of regions=7", there is >another region on asf001.sp2.ygridcore.net,45800,1356923500558 {code} 2012-12-31 03:11:44,306 DEBUG [RS_OPEN_REGION-asf001.sp2.ygridcore.net,45800,1356923500558-0] handler.OpenRegionHandler(149): Opened -ROOT-,,0.70236052 on server:asf001.sp2.ygridcore.net,45800,1356923500558 {code} Yes, It's the -ROOT- region. Also, we could see the balance logs later {code} 2012-12-31 03:11:58,883 INFO [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,mmm,1356923506886.d0a07239a287e74e7706e4b9a0c9f491., src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635 2012-12-31 03:11:58,890 INFO [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,,1356923505778.5e876dba9be19501a1eb65bf3a169e52., src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635 2012-12-31 03:11:58,949 INFO [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,bbb,1356923506859.7c3f09396e7314de6f5a757b010b6497., src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635 2012-12-31 03:11:58,967 INFO [pool-1-thread-1] master.HMaster(1325): balance hri=multi_test_table,eee,1356923506866.6a1697e740f121d009c3085e0cccd18d., src=asf001.sp2.ygridcore.net,38198,1356923500609, dest=asf001.sp2.ygridcore.net,59241,1356923517635 {code} So, I think the reason is unbalanced regions on the servers at before, And I don't think it's necessary that assert regions more than 10 for each regionserver. By the way, I find we will abort regionserver 0 in TestMultiParallel#testBatchWithPut, however we will also abort regionserver 0 TestMultiParallel#testFlushCommitsWithAbort(). It seems confused. > TestMultiParallel fails intermittently in trunk builds > ------------------------------------------------------ > > Key: HBASE-7299 > URL: https://issues.apache.org/jira/browse/HBASE-7299 > Project: HBase > Issue Type: Bug > Reporter: Ted Yu > Assignee: chunhui shen > Priority: Critical > Fix For: 0.96.0 > > Attachments: 7299-v4.txt, HBASE-7299.patch, HBASE-7299v2.patch, > HBASE-7299v3.patch > > > From trunk build #3598: > {code} > testFlushCommitsNoAbort(org.apache.hadoop.hbase.client.TestMultiParallel): > Count of regions=8 > {code} > It failed in 3595 as well: > {code} > java.lang.AssertionError: Server count=2, abort=true expected:<1> but was:<2> > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at > org.apache.hadoop.hbase.client.TestMultiParallel.doTestFlushCommits(TestMultiParallel.java:267) > at > org.apache.hadoop.hbase.client.TestMultiParallel.testFlushCommitsWithAbort(TestMultiParallel.java:226) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira