[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-23 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Priority: Critical  (was: Major)

> Deadlock in DAGAppMaster
> 
>
> Key: TEZ-2359
> URL: https://issues.apache.org/jira/browse/TEZ-2359
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Priority: Critical
>
> {code}
> Found one Java-level deadlock:
> =
> "Timer-1":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> "Dispatcher thread: Central":
>   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
>   which is held by "DelayedContainerManager"
> "DelayedContainerManager":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> Java stack information for the threads listed above:
> ===
> "Timer-1":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
>   - locked <0x0007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster)
>   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> "Dispatcher thread: Central":
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
>   - waiting to lock <0x0007cd5ab958> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   - locked <0x0007cd1d0208> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
>   at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>   at java.lang.Thread.run(Thread.java:745)
> "DelayedContainerManager":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at 

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Priority: Blocker  (was: Critical)

> Deadlock in DAGAppMaster
> 
>
> Key: TEZ-2359
> URL: https://issues.apache.org/jira/browse/TEZ-2359
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Priority: Blocker
>
> {code}
> Found one Java-level deadlock:
> =
> "Timer-1":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> "Dispatcher thread: Central":
>   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
>   which is held by "DelayedContainerManager"
> "DelayedContainerManager":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> Java stack information for the threads listed above:
> ===
> "Timer-1":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
>   - locked <0x0007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster)
>   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> "Dispatcher thread: Central":
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
>   - waiting to lock <0x0007cd5ab958> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   - locked <0x0007cd1d0208> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
>   at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>   at java.lang.Thread.run(Thread.java:745)
> "DelayedContainerManager":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

2015-04-27 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated TEZ-2359:

Target Version/s: 0.7.0

> Deadlock in DAGAppMaster
> 
>
> Key: TEZ-2359
> URL: https://issues.apache.org/jira/browse/TEZ-2359
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jeff Zhang
>Priority: Blocker
>
> {code}
> Found one Java-level deadlock:
> =
> "Timer-1":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> "Dispatcher thread: Central":
>   waiting to lock monitor 0x7fb829866d18 (object 0x0007cd5ab958, a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
>   which is held by "DelayedContainerManager"
> "DelayedContainerManager":
>   waiting for ownable synchronizer 0x0007cd0f8a30, (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> Java stack information for the threads listed above:
> ===
> "Timer-1":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
>   - locked <0x0007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster)
>   at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
>   at java.util.TimerThread.mainLoop(Timer.java:555)
>   at java.util.TimerThread.run(Timer.java:505)
> "Dispatcher thread: Central":
>   at 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
>   - waiting to lock <0x0007cd5ab958> (a 
> org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
>   at 
> org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
>   at 
> org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>   - locked <0x0007cd1d0208> (a 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
>   at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
>   at 
> org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
>   at 
> org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
>   at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
>   at java.lang.Thread.run(Thread.java:745)
> "DelayedContainerManager":
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007cd0f8a30> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
>   at 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
>   at org.apache