[ https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
huaxiang sun updated HBASE-19163: --------------------------------- Attachment: HBASE-19163.master.009.patch > "Maximum lock count exceeded" from region server's batch processing > ------------------------------------------------------------------- > > Key: HBASE-19163 > URL: https://issues.apache.org/jira/browse/HBASE-19163 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3 > Reporter: huaxiang sun > Assignee: huaxiang sun > Attachments: HBASE-19163-master-v001.patch, > HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, > HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, > HBASE-19163.master.006.patch, HBASE-19163.master.007.patch, > HBASE-19163.master.008.patch, HBASE-19163.master.009.patch, > HBASE-19163.master.009.patch, unittest-case.diff > > > In one of use cases, we found the following exception and replication is > stuck. > {code} > 2017-10-25 19:41:17,199 WARN [hconnection-0x28db294f-shared--pool4-t936] > client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last > exception: java.io.IOException: java.io.IOException: Maximum lock count > exceeded > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.Error: Maximum lock count exceeded > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528) > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871) > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more > {code} > While we are still examining the data pattern, it is sure that there are too > many mutations in the batch against the same row, this exceeds the maximum > 64k shared lock count and it throws an error and failed the whole batch. > There are two approaches to solve this issue. > 1). Let's say there are mutations against the same row in the batch, we just > need to acquire the lock once for the same row vs to acquire the lock for > each mutation. > 2). We catch the error and start to process whatever it gets and loop back. > With HBASE-17924, approach 1 seems easy to implement now. > Create the jira and will post update/patch when investigation moving forward. -- This message was sent by Atlassian JIRA (v6.4.14#64029)