[ https://issues.apache.org/jira/browse/HBASE-19639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404679#comment-16404679 ]
Anoop Sam John commented on HBASE-19639: ---------------------------------------- I some how ended up adding below comment into HBASE-20188 .. But what I try to say make more sense here. In tests, we try flush to SDD or HDD boss? In writes we normally see this compares as hottest. It used to be this way. Now I can see the issue you face is the exception because memstore size is 4x larger than flush size. As you said yes the flush seems NOT speedy enough. I can think of following 1. Now with compacting memstore, the flush op as such more time taking. With default memstore, it just a matter of iterating over a map and write cells. But now we have to read from multiple segments in a heap way and so more compares there. Anyway at flush size the flush op was triggered. But till it reaches 4x flush was not complete. This can be one reason 2. The writes to CSLM as such became fast. With default memstore, when we are at flush size and started the flush op, still writes happening to CSLM. we allow it anyway. But then CSLM state is such that it already having so many cells and writes might be bit more delayed. So the pace of this might be low enough for flush to complete. But with compacting memstore, it will become fresh CSLM again and so very fast writes. How about your global memstore size limit. I guess this might be a very large number. Normally in tests what we see is this barrier breach and so forced flushes by blocking writes. Because there are enough regions in RS and write to all regions. So any region crossing this 4x mark is less likely compared to this global barrier breach. When I did tests normally will select this barrier as 2 * regions# * flush size. We very much need flush to be faster. If not on SSD, any chance for an SSD based tests? This is one reason why am a fan of that JMS issue of flush to SSD policy. > ITBLL can't go big because RegionTooBusyException... Above memstore limit > ------------------------------------------------------------------------- > > Key: HBASE-19639 > URL: https://issues.apache.org/jira/browse/HBASE-19639 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: stack > Priority: Blocker > Fix For: 2.0.0 > > > Running ITBLLs, the basic link generator keeps failing because I run into > exceptions like below: > {code} > 2017-12-26 19:23:45,284 INFO [main] > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator: > Persisting current.length=1000000, count=1000000, id=Job: > job_1513025868268_0062 Task: attempt_1513025868268_0062_m_000006_2, > current=\x8B\xDB25\xA7*\x9A\xF5\xDEx\x83\xDF\xDC?\x94\x92, i=1000000 > 2017-12-26 19:24:18,982 INFO [htable-pool3-t6] > org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, > table=IntegrationTestBigLinkedList, attempt=10/11 failed=524ops, last > exception: org.apache.hadoop.hbase.RegionTooBusyException: > org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, > regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., > server=ve0538.halxg.cloudera.com,16020,1514343549993, > memstoreSize=538084641, blockingMemStoreSize=536870912 > at > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, > retrying after=10050ms, replay=524ops > 2017-12-26 19:24:29,061 INFO [htable-pool3-t6] > org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, > table=IntegrationTestBigLinkedList, attempt=11/11 failed=524ops, last > exception: org.apache.hadoop.hbase.RegionTooBusyException: > org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, > regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., > server=ve0538.halxg.cloudera.com,16020,1514343549993, > memstoreSize=538084641, blockingMemStoreSize=536870912 > at > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, > retrying after=10033ms, replay=524ops > 2017-12-26 19:24:37,183 INFO [ReadOnlyZKClient] > org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x015051a0 no activities > for 60000 ms, close active connection. Will reconnect next time when there > are new requests. > 2017-12-26 19:24:39,122 WARN [htable-pool3-t6] > org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, > table=IntegrationTestBigLinkedList, attempt=12/11 failed=524ops, last > exception: org.apache.hadoop.hbase.RegionTooBusyException: > org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, > regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b., > server=ve0538.halxg.cloudera.com,16020,1514343549993, > memstoreSize=538084641, blockingMemStoreSize=536870912 > at > org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587) > at > org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304) > ... > {code} > Fails task over and over. With server-killing monkeys. > 24Gs which should be more than enough. > Had just finished a big compaction. > Whats shutting us out? Why taking so long to flush? We seen stuck at limit so > job fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)