[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368377#comment-14368377 ]
Jeff Zhang commented on TEZ-2204: --------------------------------- It is may be an issue related to YARN-2917. Because tez has its own AsyncDispatcher, but hasn't include of the patch of YARN-2917 Copy the jstack {code} "Thread-1" prio=5 tid=0x00007f9d13011800 nid=0xe507 in Object.wait() [0x0000000117559000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007fed1c360> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1281) - locked <0x00000007fed1c360> (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1355) at org.apache.tez.common.AsyncDispatcher.serviceStop(AsyncDispatcher.java:162) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <0x00000007fed61000> (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1539) at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1674) - locked <0x00000007fed0dc50> (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked <0x00000007fed0de80> (a java.lang.Object) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run(DAGAppMaster.java:1940) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) Locked ownable synchronizers: - None "App Shared Pool - #1" daemon prio=5 tid=0x00007f9d13e60800 nid=0xdd03 in Object.wait() [0x000000011714c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000007ff1193b8> (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1281) - locked <0x00000007ff1193b8> (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1355) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.exit(Shutdown.java:212) - locked <0x00000007ff111ec8> (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:109) at java.lang.System.exit(System.java:962) at org.apache.tez.test.TestAMRecovery$ControlledImmediateStartVertexManager.onSourceTaskCompleted(TestAMRecovery.java:601) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:525) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:580) - locked <0x00000007fb82fac8> (a org.apache.tez.dag.app.dag.impl.VertexManager) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:575) at org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:27) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - <0x00000007fbc182d8> (a java.util.concurrent.ThreadPoolExecutor$Worker) {code} > TestAMRecovery increasingly flaky on jenkins builds. > ----------------------------------------------------- > > Key: TEZ-2204 > URL: https://issues.apache.org/jira/browse/TEZ-2204 > Project: Apache Tez > Issue Type: Bug > Reporter: Hitesh Shah > Assignee: Jeff Zhang > > In recent pre-commit builds and daily builds, there seem to have been some > occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)