[
https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006662#comment-13006662
]
stack commented on HBASE-3622:
------------------------------
@Dmitry sent me more thread dumps. This time we're in a different location.
He sent me logs an hour apart. Its stuck in same place in both cases. We're
here:
{code}
1455 "IPC Server handler 2 on 60020" daemon prio=10 tid=0x0000000048d0d800
nid=0x7550 waiting on condition [0x0000000044b6e000]
1456 java.lang.Thread.State: WAITING (parking)
1457 at sun.misc.Unsafe.park(Native Method)
1458 - parking to wait for <0x00002aaac04a9d08> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
1459 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
1460 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
1461 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
1462 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
1463 at
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
1464 at
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
1465 at java.util.concurrent.DelayQueue.offer(DelayQueue.java:83)
1466 at java.util.concurrent.DelayQueue.add(DelayQueue.java:71)
1467 at
org.apache.hadoop.hbase.regionserver.Leases.renewLease(Leases.java:194)
{code}
A few threads are 'parking to wait for' 0x00002aaac04a9d08. This one is too:
{code}
599 "regionserver60020.leaseChecker" daemon prio=10 tid=0x0000000048db0800
nid=0x7548 waiting on condition [0x0000000044366000]
600 java.lang.Thread.State: WAITING (parking)
601 at sun.misc.Unsafe.park(Native Method)
602 - parking to wait for <0x00002aaac04a9d08> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
603 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
604 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
605 at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
606 at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1971)
607 at java.util.concurrent.DelayQueue.poll(DelayQueue.java:209)
608 at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:82)
{code}
These are the only mentions of 0x00002aaac04a9d08 in all thread dumps.
It sounds like this, http://bugs.sun.com/view_bug.do?bug_id=6822370. Though
the original report is on sun hw, later folks chime in w/ the issue on various
linux. Setting membar seems to be a workaround, or running with u18/u21 where
bug is purportedly fixed.
> Deadlock in HBaseServer (JVM bug?)
> ----------------------------------
>
> Key: HBASE-3622
> URL: https://issues.apache.org/jira/browse/HBASE-3622
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.90.1
> Reporter: Jean-Daniel Cryans
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-3622.patch
>
>
> On Dmitriy's cluster:
> {code}
> "IPC Reader 0 on port 60020" prio=10 tid=0x00002aacb4a82800 nid=0x3a72
> waiting on condition [0x00000000429ba000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00002aaabf5fa6d0> (a
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
> at
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
> at
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
> at
> java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
> - locked <0x00002aaabf580fb0> (a
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> ...
> "IPC Server handler 29 on 60020" daemon prio=10 tid=0x00002aacbc163800
> nid=0x3acc waiting on condition [0x00000000462f3000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00002aaabf5e3800> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)
> "IPC Server handler 28 on 60020" daemon prio=10 tid=0x00002aacbc161800
> nid=0x3acb waiting on condition [0x00000000461f2000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00002aaabf5e3800> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
> at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025
> ...
> {code}
> This region server stayed in this state for hours. The reader is waiting to
> put and the handlers are waiting to take, and they wait on different lock
> ids. It reminds me of the UseMembar thing about the JVM sometime missing to
> notify waiters. In any case, that RS needed to be closed in order to get out
> of that state.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira