[ 
https://issues.apache.org/jira/browse/HBASE-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006662#comment-13006662
 ] 

stack commented on HBASE-3622:
------------------------------

@Dmitry sent me more thread dumps.  This time we're in a different location.  
He sent me logs an hour apart.  Its stuck in same place in both cases.  We're 
here:

{code}
1455 "IPC Server handler 2 on 60020" daemon prio=10 tid=0x0000000048d0d800 
nid=0x7550 waiting on condition [0x0000000044b6e000]
1456    java.lang.Thread.State: WAITING (parking)
1457     at sun.misc.Unsafe.park(Native Method)
1458     - parking to wait for  <0x00002aaac04a9d08> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
1459     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
1460     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
1461     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
1462     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
1463     at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
1464     at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
1465     at java.util.concurrent.DelayQueue.offer(DelayQueue.java:83)
1466     at java.util.concurrent.DelayQueue.add(DelayQueue.java:71)
1467     at 
org.apache.hadoop.hbase.regionserver.Leases.renewLease(Leases.java:194)
{code}

A few threads are 'parking to wait for' 0x00002aaac04a9d08.  This one is too:

{code}
 599 "regionserver60020.leaseChecker" daemon prio=10 tid=0x0000000048db0800 
nid=0x7548 waiting on condition [0x0000000044366000]
 600    java.lang.Thread.State: WAITING (parking)
 601     at sun.misc.Unsafe.park(Native Method)
 602     - parking to wait for  <0x00002aaac04a9d08> (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
 603     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 604     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 605     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
 606     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1971)
 607     at java.util.concurrent.DelayQueue.poll(DelayQueue.java:209)
 608     at org.apache.hadoop.hbase.regionserver.Leases.run(Leases.java:82)
{code}

These are the only mentions of 0x00002aaac04a9d08 in all thread dumps.

It sounds like this, http://bugs.sun.com/view_bug.do?bug_id=6822370.  Though 
the original report is on sun hw, later folks chime in w/ the issue on various 
linux.  Setting membar seems to be a workaround, or running with u18/u21 where 
bug is purportedly fixed.



> Deadlock in HBaseServer (JVM bug?)
> ----------------------------------
>
>                 Key: HBASE-3622
>                 URL: https://issues.apache.org/jira/browse/HBASE-3622
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-3622.patch
>
>
> On Dmitriy's cluster:
> {code}
> "IPC Reader 0 on port 60020" prio=10 tid=0x00002aacb4a82800 nid=0x3a72 
> waiting on condition [0x00000000429ba000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00002aaabf5fa6d0> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
>         at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>         at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>         at 
> java.util.concurrent.LinkedBlockingQueue.signalNotEmpty(LinkedBlockingQueue.java:103)
>         at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:267)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:985)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:946)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:522)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:316)
>         - locked <0x00002aaabf580fb0> (a 
> org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> ...
> "IPC Server handler 29 on 60020" daemon prio=10 tid=0x00002aacbc163800 
> nid=0x3acc waiting on condition [0x00000000462f3000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00002aaabf5e3800> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025)
> "IPC Server handler 28 on 60020" daemon prio=10 tid=0x00002aacbc161800 
> nid=0x3acb waiting on condition [0x00000000461f2000]
>    java.lang.Thread.State: WAITING (parking)
>         at sun.misc.Unsafe.park(Native Method)
>         - parking to wait for  <0x00002aaabf5e3800> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>         at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
>         at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1025
> ...
> {code}
> This region server stayed in this state for hours. The reader is waiting to 
> put and the handlers are waiting to take, and they wait on different lock 
> ids. It reminds me of the UseMembar thing about the JVM sometime missing to 
> notify waiters. In any case, that RS needed to be closed in order to get out 
> of that state. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to