In the class AMRMClientAsyncImpl the object(7c3041e28) is being locked by
Heartbeat thread(which kinds of run a infinite loop as any heartbeat
thread) which is requested to be locked by the method
unregisterApplicationMaster.
Once the method unregisterApplicationMaster can lock the requested object;
then only it can notify the heartbeat thread to exit by a boolean flag
keepRunning.
Following is the thread-dump for the deadlock:
"AMShutdownThread" daemon prio=5 tid=7f9a02921800 nid=0x115d68000 waiting
for monitor entry [115d67000]
java.lang.Thread.State: BLOCKED (on object monitor)
at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.unregisterApplicationMaster(AMRMClientAsyncImpl.java:156)
- waiting to lock <7c3041e28> (a java.lang.Object)
at
org.apache.tez.dag.app.rm.TaskScheduler.serviceStop(TaskScheduler.java:394)
- locked <7c3006aa0> (a org.apache.tez.dag.app.rm.TaskScheduler)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <7c3038008> (a java.lang.Object)
at
org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.serviceStop(TaskSchedulerEventHandler.java:357)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <7c2f71360> (a java.lang.Object)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1518)
at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1649)
- locked <7c2f51790> (a org.apache.tez.dag.app.DAGAppMaster)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked <7c2fed728> (a java.lang.Object)
at
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHandler$AMShutdownRunnable.run(DAGAppMaster.java:607)
at java.lang.Thread.run(Thread.java:695)
"AMRM Heartbeater thread" prio=5 tid=7f9a0c0e8800 nid=0x111e70000 waiting
on condition [111e6f000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.util.ThreadUtil.sleepAtLeastIgnoreInterrupts(ThreadUtil.java:43)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:150)
at com.sun.proxy.$Proxy9.allocate(Unknown Source)
at
org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:246)
at
org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$HeartbeatThread.run(AMRMClientAsyncImpl.java:224)
- locked <7c3041e28> (a java.lang.Object)
*public void unregisterApplicationMaster(FinalApplicationStatus appStatus,*
* String appMessage, String appTrackingUrl) throws YarnException,*
* IOException {*
* synchronized (unregisterHeartbeatLock) {*
* keepRunning = false;*
* client.unregisterApplicationMaster(appStatus, appMessage,
appTrackingUrl);*
* }*
* }*
The line "keepRunning = false" should be outside the synchronized block.
I am not sure this should be regarded as problem in yarn or TEZ. The flag
is private and can't be accessed by Tez implementation TezAMRMClientAsync.
--
Cheers,
*Subroto Sanyal*