[ 
https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306187#comment-15306187
 ] 

Heng Chen edited comment on HBASE-15900 at 5/30/16 4:09 AM:
------------------------------------------------------------

[~stack]
I have found something
HStore.lock.readLock was hold by this thread below.  So compaction could not 
acquire the lock.writeLock in HStore.replaceStoreFiles,  so all compaction was 
blocked and memstore could not be flushed because of so many store files exist. 
 

Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry,  it 
happened many times. Maybe it was HBASE-14178

And there is another point i can't understand,  only readLock was held by one 
thread, why there were so many threads waiting for readLock?

BTW. scan operation in my cluster is only called by phoenix, not sure it has 
relates with the problem.

{code}
Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020):
  State: WAITING
  Blocked count: 224987
  Waited count: 253413
  Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:502)
    org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397)
    
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
    
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
    
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
    
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
    
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:217)
    org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003)
    
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5294)
    
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454)
    
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253)
    
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
{code}


was (Author: chenheng):
[~stack]
I have found something
HStore.lock.readLock was hold by this thread below.  So compaction could not 
acquire the lock.writeLock in HStore.replaceStoreFiles,  so all compaction was 
blocked and memstore could not be flushed because of so many store files exist. 
 

Now, i am trying to figure out why scan was blocked in IdLock.getLockEntry,  it 
happened many times. Maybe it was HBASE-14178

BTW. scan operation in my cluster is only called by phoenix, not sure it has 
relates with the problem.

{code}
Thread 43 (B.defaultRpcServer.handler=4,queue=4,port=16020):
  State: WAITING
  Blocked count: 224987
  Waited count: 253413
  Waiting on org.apache.hadoop.hbase.util.IdLock$Entry@48148720
  Stack:
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:502)
    org.apache.hadoop.hbase.util.IdLock.getLockEntry(IdLock.java:81)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:397)
    
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
    
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
    
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:247)
    
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:156)
    
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:363)
    
org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:217)
    org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2003)
    
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:5294)
    
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:2486)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2472)
    org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2454)
    
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2253)
    
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
    org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
    org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
{code}

> RS stuck in get lock of HStore
> ------------------------------
>
>                 Key: HBASE-15900
>                 URL: https://issues.apache.org/jira/browse/HBASE-15900
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.1, 1.3.0
>            Reporter: Heng Chen
>         Attachments: 9fe15a52_9fe15a52_save, 
> c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt
>
>
> It happens on my production cluster when i run MR job.  I save the dump.txt 
> from this RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
>    32   State: WAITING
>    31   Blocked count: 477816
>    30   Waited count: 535255
>    29   Waiting on 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
>    28   Stack:
>    27     sun.misc.Unsafe.park(Native Method)
>    26     java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>    25     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>    24     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>    23     
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>    22     
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>    21     org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
>    20     
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
>    19     
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
>    18     
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
>    17     
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
>    16     
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>    15     
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>    14     
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
>    13     
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>    12     org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
>    11     org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>    10     
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>     9     org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>     8     java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to