[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377310#comment-14377310 ] Hitesh Shah commented on TEZ-2204: -- Comments: {code} // don't handle events if DAGAppMaster is in the state of STOPPED, 720 // otherwise there may be dead-lock happen. TEZ-2204 721 if (DAGAppMaster.this.getServiceState() == STATE.STOPPED) { 722 return; 723 } {code} Can you add a log message to identify what events are being received after the AM is stopped? +1 after the above comment is addressed. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, TEZ-2204-4.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377018#comment-14377018 ] Jeff Zhang commented on TEZ-2204: - Upload new patch (exclude the findbugs warning ) [~hitesh] [~bikassaha] Please help review it. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377074#comment-14377074 ] Hadoop QA commented on TEZ-2204: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12706785/TEZ-2204-4.patch against master revision 6d0b10a. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/333//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/333//console This message is automatically generated. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, TEZ-2204-4.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371062#comment-14371062 ] Jeff Zhang commented on TEZ-2204: - Upload patch. [~hitesh] [~bikassaha] Please help review it. 2 potential dead lock: * Related to YARN-2917. Tez's AsyncDispatcher doesn't integrate its patch. * Deadlock in DAGAppMaster. method DAGAppMaster::handle DAGAppMaster:stopService. While stopService is called, it would stop the AsyncDispatcher, while AsyncDispatcher will drain its events which may call DAGAppMaster::handle. And method handle() stopService both has the synchronized keyword. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371273#comment-14371273 ] Jeff Zhang commented on TEZ-2204: - The findbug issue should be OK. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371255#comment-14371255 ] Hadoop QA commented on TEZ-2204: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12705906/TEZ-2204-2.patch against master revision 9b845f2. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/319//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/319//artifact/patchprocess/newPatchFindbugsWarningstez-common.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/319//console This message is automatically generated. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369378#comment-14369378 ] Jeff Zhang commented on TEZ-2204: - Also found another deal lock in DAGAppMaster. TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368377#comment-14368377 ] Jeff Zhang commented on TEZ-2204: - It is may be an issue related to YARN-2917. Because tez has its own AsyncDispatcher, but hasn't include of the patch of YARN-2917 Copy the jstack {code} Thread-1 prio=5 tid=0x7f9d13011800 nid=0xe507 in Object.wait() [0x000117559000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007fed1c360 (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1281) - locked 0x0007fed1c360 (a java.lang.Thread) at java.lang.Thread.join(Thread.java:1355) at org.apache.tez.common.AsyncDispatcher.serviceStop(AsyncDispatcher.java:162) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0x0007fed61000 (a java.lang.Object) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1539) at org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1674) - locked 0x0007fed0dc50 (a org.apache.tez.dag.app.DAGAppMaster) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) - locked 0x0007fed0de80 (a java.lang.Object) at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run(DAGAppMaster.java:1940) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) Locked ownable synchronizers: - None App Shared Pool - #1 daemon prio=5 tid=0x7f9d13e60800 nid=0xdd03 in Object.wait() [0x00011714c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007ff1193b8 (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1281) - locked 0x0007ff1193b8 (a org.apache.hadoop.util.ShutdownHookManager$1) at java.lang.Thread.join(Thread.java:1355) at java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106) at java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46) at java.lang.Shutdown.runHooks(Shutdown.java:123) at java.lang.Shutdown.sequence(Shutdown.java:167) at java.lang.Shutdown.exit(Shutdown.java:212) - locked 0x0007ff111ec8 (a java.lang.Class for java.lang.Shutdown) at java.lang.Runtime.exit(Runtime.java:109) at java.lang.System.exit(System.java:962) at org.apache.tez.test.TestAMRecovery$ControlledImmediateStartVertexManager.onSourceTaskCompleted(TestAMRecovery.java:601) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:525) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:580) - locked 0x0007fb82fac8 (a org.apache.tez.dag.app.dag.impl.VertexManager) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:575) at org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:27) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - 0x0007fbc182d8 (a java.util.concurrent.ThreadPoolExecutor$Worker) {code} TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.
[ https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365908#comment-14365908 ] Hitesh Shah commented on TEZ-2204: -- Latest failure: https://builds.apache.org/job/Tez-Build/941 TestAMRecovery increasingly flaky on jenkins builds. - Key: TEZ-2204 URL: https://issues.apache.org/jira/browse/TEZ-2204 Project: Apache Tez Issue Type: Bug Reporter: Hitesh Shah Assignee: Jeff Zhang In recent pre-commit builds and daily builds, there seem to have been some occurrences of TestAMRecovery failing or timing out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)