[ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeff Zhang updated TEZ-2359: ---------------------------- Priority: Critical (was: Major) > Deadlock in DAGAppMaster > ------------------------ > > Key: TEZ-2359 > URL: https://issues.apache.org/jira/browse/TEZ-2359 > Project: Apache Tez > Issue Type: Bug > Reporter: Jeff Zhang > Priority: Critical > > {code} > Found one Java-level deadlock: > ============================= > "Timer-1": > waiting for ownable synchronizer 0x00000007cd0f8a30, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "Dispatcher thread: Central" > "Dispatcher thread: Central": > waiting to lock monitor 0x00007fb829866d18 (object 0x00000007cd5ab958, a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService), > which is held by "DelayedContainerManager" > "DelayedContainerManager": > waiting for ownable synchronizer 0x00000007cd0f8a30, (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), > which is held by "Dispatcher thread: Central" > Java stack information for the threads listed above: > =================================================== > "Timer-1": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007cd0f8a30> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) > at > org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015) > - locked <0x00000007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster) > at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > "Dispatcher thread: Central": > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842) > - waiting to lock <0x00000007cd5ab958> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566) > at > org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832) > at > org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201) > at > org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362) > at > org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356) > at > org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) > - locked <0x00000007cd1d0208> (a > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine) > at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510) > at > org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879) > at > org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868) > at > org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182) > at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113) > at java.lang.Thread.run(Thread.java:745) > "DelayedContainerManager": > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00000007cd0f8a30> (a > java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731) > at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531) > at > org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getAMState(DAGAppMaster.java:1522) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:585) > - locked <0x00000007cd5ab958> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:82) > at > org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:1877) > - locked <0x00000007cd5ab958> (a > org.apache.tez.dag.app.rm.YarnTaskSchedulerService) > Found 1 deadlock. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)