[jira] [Comment Edited] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

huaxiang sun (JIRA) Mon, 20 Nov 2017 00:17:21 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16258931#comment-16258931
 ]


huaxiang sun edited comment on HBASE-19163 at 11/20/17 8:16 AM:
----------------------------------------------------------------

Hi [[email protected]] and [~allan163], I uploaded a new patch, which just 
compares object reference for checking if the current row is same as the 
previous row. Can you review? It is in the review board as well, thanks.


was (Author: huaxiang):
Hi [[email protected]] and [~allan163], I uploaded a new patch, which just 
compares object reference compare for checking if the current row is same as 
the previous row. Can you review? It is in the review board as well, thanks.

> "Maximum lock count exceeded" from region server's batch processing
> -------------------------------------------------------------------
>
>                 Key: HBASE-19163
>                 URL: https://issues.apache.org/jira/browse/HBASE-19163
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3
>            Reporter: huaxiang sun
>            Assignee: huaxiang sun
>         Attachments: HBASE-19163-master-v001.patch, 
> HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, 
> HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, unittest-case.diff
>
>
> In one of use cases, we found the following exception and replication is 
> stuck.
> {code}
> 2017-10-25 19:41:17,199 WARN  [hconnection-0x28db294f-shared--pool4-t936] 
> client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last 
> exception: java.io.IOException: java.io.IOException: Maximum lock count 
> exceeded
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165)
> Caused by: java.lang.Error: Maximum lock count exceeded
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327)
>         at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715)
>         at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170)
>         ... 3 more
> {code}
> While we are still examining the data pattern, it is sure that there are too 
> many mutations in the batch against the same row, this exceeds the maximum 
> 64k shared lock count and it throws an error and failed the whole batch.
> There are two approaches to solve this issue.
> 1). Let's say there are mutations against the same row in the batch, we just 
> need to acquire the lock once for the same row vs to acquire the lock for 
> each mutation.
> 2). We catch the error and start to process whatever it gets and loop back.
> With HBASE-17924, approach 1 seems easy to implement now. 
> Create the jira and will post update/patch when investigation moving forward.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing

Reply via email to