[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368377#comment-14368377
 ] 

Jeff Zhang commented on TEZ-2204:
---------------------------------

It is may be an issue related to YARN-2917. Because tez has its own 
AsyncDispatcher, but hasn't include of the patch of YARN-2917

Copy the jstack
{code}
"Thread-1" prio=5 tid=0x00007f9d13011800 nid=0xe507 in Object.wait() 
[0x0000000117559000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007fed1c360> (a java.lang.Thread)
        at java.lang.Thread.join(Thread.java:1281)
        - locked <0x00000007fed1c360> (a java.lang.Thread)
        at java.lang.Thread.join(Thread.java:1355)
        at 
org.apache.tez.common.AsyncDispatcher.serviceStop(AsyncDispatcher.java:162)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
        - locked <0x00000007fed61000> (a java.lang.Object)
        at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
        at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
        at 
org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1539)
        at 
org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1674)
        - locked <0x00000007fed0dc50> (a org.apache.tez.dag.app.DAGAppMaster)
        at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
        - locked <0x00000007fed0de80> (a java.lang.Object)
        at 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run(DAGAppMaster.java:1940)
        at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

   Locked ownable synchronizers:
        - None

"App Shared Pool - #1" daemon prio=5 tid=0x00007f9d13e60800 nid=0xdd03 in 
Object.wait() [0x000000011714c000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007ff1193b8> (a 
org.apache.hadoop.util.ShutdownHookManager$1)
        at java.lang.Thread.join(Thread.java:1281)
        - locked <0x00000007ff1193b8> (a 
org.apache.hadoop.util.ShutdownHookManager$1)
        at java.lang.Thread.join(Thread.java:1355)
        at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
        at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
        at java.lang.Shutdown.runHooks(Shutdown.java:123)
        at java.lang.Shutdown.sequence(Shutdown.java:167)
        at java.lang.Shutdown.exit(Shutdown.java:212)
        - locked <0x00000007ff111ec8> (a java.lang.Class for java.lang.Shutdown)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:962)
        at 
org.apache.tez.test.TestAMRecovery$ControlledImmediateStartVertexManager.onSourceTaskCompleted(TestAMRecovery.java:601)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:525)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:580)
        - locked <0x00000007fb82fac8> (a 
org.apache.tez.dag.app.dag.impl.VertexManager)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:575)
        at 
org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:27)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
        - <0x00000007fbc182d8> (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}

> TestAMRecovery increasingly flaky on jenkins builds. 
> -----------------------------------------------------
>
>                 Key: TEZ-2204
>                 URL: https://issues.apache.org/jira/browse/TEZ-2204
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>
> In recent pre-commit builds and daily builds, there seem to have been some 
> occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to