[ 
https://issues.apache.org/jira/browse/HBASE-19639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405040#comment-16405040
 ] 

stack commented on HBASE-19639:
-------------------------------

[~eshcar] any chance you can attach here the entire log file of some RS? If 
possible from start-hbase point and then throughout the test. Thanks.

Will do. Let me get you a good one (Cluster doing another test at mo.... will 
be back).

[~anoop.hbase]
bq. In tests, we try flush to SDD or HDD boss?

HDD

bq. Now I can see the issue you face is the exception because memstore size is 
4x larger than flush size.

I don't follow. 4x is the default. You suggesting we do 2x?

bq. But with compacting memstore, it will become fresh CSLM again and so very 
fast writes.

Before in-memory compaction, we used to snapshot; move aside current CSLM 
taking no more writes to it and replace it with a new one.... so new writes 
would go fast again.

For your #1 and #2, we should be able to test in isolation? No? A compare of an 
old-school flush against new in-memory compaction flush?

Wonder if a difference between read form CSLM and heap read from a bunch of 
segments.

bq. How about your global memstore size limit.

This is default. I'm trying to run w/ all defaults. I will change default if we 
come up w/ something better ("...95% of 
hbase.regionserver.global.memstore.size..."). We don't block writes now. 
Instead we throw the RegionTooBusyException. We don't put anything in the logs. 
Let me fix this. Would be good to note when the Region is struggling... or at 
least let me see if a metric I could use.

bq. When I did tests normally will select this barrier as 2 * regions# * flush 
size. 

Hmm.. This last run of mine had it at 8 *. Let me set it do the default (4 * ).

bq.  If not on SSD, any chance for an SSD based tests?

Nope. No SSD in these chassis.









> ITBLL can't go big because RegionTooBusyException... Above memstore limit
> -------------------------------------------------------------------------
>
>                 Key: HBASE-19639
>                 URL: https://issues.apache.org/jira/browse/HBASE-19639
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 2.0.0
>
>
> Running ITBLLs, the basic link generator keeps failing because I run into 
> exceptions like below:
> {code}
> 2017-12-26 19:23:45,284 INFO [main] 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator: 
> Persisting current.length=1000000, count=1000000, id=Job: 
> job_1513025868268_0062 Task: attempt_1513025868268_0062_m_000006_2, 
> current=\x8B\xDB25\xA7*\x9A\xF5\xDEx\x83\xDF\xDC?\x94\x92, i=1000000
> 2017-12-26 19:24:18,982 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=10/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10050ms, replay=524ops
> 2017-12-26 19:24:29,061 INFO [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=11/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
>  on ve0538.halxg.cloudera.com,16020,1514343549993, tracking started null, 
> retrying after=10033ms, replay=524ops
> 2017-12-26 19:24:37,183 INFO [ReadOnlyZKClient] 
> org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient: 0x015051a0 no activities 
> for 60000 ms, close active connection. Will reconnect next time when there 
> are new requests.
> 2017-12-26 19:24:39,122 WARN [htable-pool3-t6] 
> org.apache.hadoop.hbase.client.AsyncRequestFutureImpl: #2, 
> table=IntegrationTestBigLinkedList, attempt=12/11 failed=524ops, last 
> exception: org.apache.hadoop.hbase.RegionTooBusyException: 
> org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, 
> regionName=IntegrationTestBigLinkedList,q\xC7\x1Cq\xC7\x1Cq\xC0,1514342757438.71ef1fbab1576588955f45796e95c08b.,
>  server=ve0538.halxg.cloudera.com,16020,1514343549993, 
> memstoreSize=538084641, blockingMemStoreSize=536870912
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:4178)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3799)
>       at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3739)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:975)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:894)
>       at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2587)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:41560)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:404)
>       at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
>       at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> ...
> {code}
> Fails task over and over. With server-killing monkeys.
> 24Gs which should be more than enough.
> Had just finished a big compaction.
> Whats shutting us out? Why taking so long to flush? We seen stuck at limit so 
> job fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to