[ https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005509#comment-13005509 ]
Ted Yu commented on HBASE-3622: ------------------------------- I noticed this: {code} protected BlockingQueue<Call> callQueue; // queued calls protected BlockingQueue<Call> priorityCallQueue; {code} I wonder if what values the following parameters carried in his cluster: hbase.regionserver.handler.count hbase.regionserver.metahandler.count > Deadlock in HBaseServer (JVM bug?) > ---------------------------------- > > Key: HBASE-3622 > URL: https://issues.apache.org/jira/browse/HBASE-3622 > Project: HBase > Issue Type: Bug > Affects Versions: 0.90.1 > Reporter: Jean-Daniel Cryans > Priority: Critical > Fix For: 0.92.0 > > Attachments: HBASE-3622.patch > > > On Dmitriy's cluster: > {code} > "IPC Reader 0 on port 60020" prio=10 tid=0x00002aacb4a82800 nid=0x3a72 > waiting on condition [0x00000000429ba000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaabf5fa6d0> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114) > at > java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) > at > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) > at > java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316) > - locked <0x00002aaabf580fb0> (a > org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > ... > "IPC Server handler 29 on 60020" daemon prio=10 tid=0x00002aacbc163800 > nid=0x3acc waiting on condition [0x00000000462f3000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaabf5e3800> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025) > "IPC Server handler 28 on 60020" daemon prio=10 tid=0x00002aacbc161800 > nid=0x3acb waiting on condition [0x00000000461f2000] > java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00002aaabf5e3800> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) > at > java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025 > ... > {code} > This region server stayed in this state for hours. The reader is waiting to > put and the handlers are waiting to take, and they wait on different lock > ids. It reminds me of the UseMembar thing about the JVM sometime missing to > notify waiters. In any case, that RS needed to be closed in order to get out > of that state. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira