[ 
https://issues.apache.org/jira/browse/HBASE-19639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404679#comment-16404679
 ] 

Anoop Sam John commented on HBASE-19639:
----------------------------------------

I some how ended up adding below comment into HBASE-20188 .. But what I try to 
say make more sense here.

In tests, we try flush to SDD or HDD boss?
In writes we normally see this compares as hottest. It used to be this way. Now 
I can see the issue you face is the exception because memstore size is 4x 
larger than flush size. As you said yes the flush seems NOT speedy enough. I 
can think of following
1. Now with compacting memstore, the flush op as such more time taking. With 
default memstore, it just a matter of iterating over a map and write cells. But 
now we have to read from multiple segments in a heap way and so more compares 
there. Anyway at flush size the flush op was triggered. But till it reaches 4x 
flush was not complete. This can be one reason
2. The writes to CSLM as such became fast. With default memstore, when we are 
at flush size and started the flush op, still writes happening to CSLM. we 
allow it anyway. But then CSLM state is such that it already having so many 
cells and writes might be bit more delayed. So the pace of this might be low 
enough for flush to complete. But with compacting memstore, it will become 
fresh CSLM again and so very fast writes.
How about your global memstore size limit. I guess this might be a very large 
number. Normally in tests what we see is this barrier breach and so forced 
flushes by blocking writes. Because there are enough regions in RS and write to 
all regions. So any region crossing this 4x mark is less likely compared to 
this global barrier breach. When I did tests normally will select this barrier 
as 2 * regions# * flush size. 
We very much need flush to be faster. If not on SSD, any chance for an SSD 
based tests? This is one reason why am a fan of that JMS issue of flush to SSD 
policy.

> ITBLL can't go big because RegionTooBusyException... Above memstore limit
> -------------------------------------------------------------------------
>
>                 Key: HBASE-19639
>                 URL: https://issues.apache.org/jira/browse/HBASE-19639
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> Running ITBLLs, the basic link generator keeps failing because I run into 
> exceptions like below:
> {code}
> 2017-12-26 19:23:45,284 INFO [main] 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator: 
> Persisting current.length=1000000, count=1000000, id=Job: 
> job_1513025868268_0062 Task: attempt_1513025868268_0062_m_000006_2, 
> current=\x8B\xDB25\xA7*\x9A\xF5\xDEx\x83\xDF\xDC?\x94\x92, i=1000000
> 2017-12-26 19:24:18,982 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=10/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10050ms, replay=524ops
> 2017-12-26 19:24:29,061 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=11/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10033ms, replay=524ops
> 2017-12-26 19:24:37,183 INFO [ReadOnlyZKClient] 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x015051a0 no activities 
> for 60000 ms, close active connection. Will reconnect next time when there 
> are new requests.
> 2017-12-26 19:24:39,122 WARN [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=12/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> ...
> {code}
> Fails task over and over. With server-killing monkeys.
> 24Gs which should be more than enough.
> Had just finished a big compaction.
> Whats shutting us out? Why taking so long to flush? We seen stuck at limit so 
> job fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to