[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168587#comment-17168587 ] Steve Loughran commented on TEZ-1661: - Just hit this problem in a hadoop-aws test run inside log4j. Funny that on the first page of google results, up come my colleagues and other ASF people. Did anyone ever come up with a root cause for the hang? > LocalTaskScheduler hangs when shutdown > -- > > Key: TEZ-1661 > URL: https://issues.apache.org/jira/browse/TEZ-1661 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.5.0 > Environment: Local Mode >Reporter: Oleg Zhurakousky >Assignee: Jeff Zhang >Priority: Major > Fix For: 0.7.0, 0.6.1 > > Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch > > > LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when > TezClient shuts down (e.g., TezClient.stop). > Below is jstack output observed when running in Tez local mode: > {code} > "Thread-53" prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable > [0x00011df9] >java.lang.Thread.State: RUNNABLE > at java.lang.Throwable.fillInStackTrace(Native Method) > at java.lang.Throwable.fillInStackTrace(Throwable.java:783) > - locked <0x0007b6ce60a0> (a java.lang.InterruptedException) > at java.lang.Throwable.(Throwable.java:250) > at java.lang.Exception.(Exception.java:54) > at java.lang.InterruptedException.(InterruptedException.java:57) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) > at > java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) > at > org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) > at > org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14286196#comment-14286196 ] Oleg Zhurakousky commented on TEZ-1661: --- No, I have not tried it with the patch, but if you say you tested it based on that example then I am fine. Thanks LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14282055#comment-14282055 ] Jeff Zhang commented on TEZ-1661: - [~ozhurakousky] I run it as application (not JUnit), and saw the same jstack as you. And have verified the issue is addressed by this patch, do you still have the issue even with the patch ? [~sseth] It should be reproducible, did you remove the System::exit in WordCount ? {code} int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); // remove it {code} LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280117#comment-14280117 ] Jeff Zhang commented on TEZ-1661: - committed to master LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280507#comment-14280507 ] Oleg Zhurakousky commented on TEZ-1661: --- Here is the code to reproduce it: {code} public static void main(String[] args) throws Exception { TezClient client = TezClient.create(foo, new TezConfiguration()); client.start(); client.stop(); System.out.println(Done); } {code} Make sure you run it as Java application (main) and not JUnit since it will essentially do System.exit. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279497#comment-14279497 ] Siddharth Seth commented on TEZ-1661: - [~zjffdu] - the patch is required, however I don't think this thread blocks JVM shutdown since it's a daemon. Is there a way to reproduce this ? LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279593#comment-14279593 ] Jeff Zhang commented on TEZ-1661: - [~sseth] It is not daemon thread. Also verify it through jstack. {code} Thread-33 prio=5 tid=0x7fb553266800 nid=0x6307 runnable [0x0001153e2000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b05c6b40 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:322) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:316) at java.lang.Thread.run(Thread.java:745) {code} bq. Is there a way to reproduce this ? Add the following in TezExampleBase.createTezClient and remove system.exit of WordCount.java can reproduce it. {code} tezConf.setBoolean(TezConfiguration.TEZ_LOCAL_MODE, true); tezConf.set(fs.defaultFS, file:///); tezConf.setBoolean( TezRuntimeConfiguration.TEZ_RUNTIME_OPTIMIZE_LOCAL_FETCH, true); {code} Attach a new patch for changing the thread to daemon. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279646#comment-14279646 ] Hadoop QA commented on TEZ-1661: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692646/TEZ-1661-2.patch against master revision 2544b05. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 68 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/42//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/42//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/42//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/42//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/42//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/42//console This message is automatically generated. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279637#comment-14279637 ] Siddharth Seth commented on TEZ-1661: - I can't reproduce this locally, but the patch looks good. +1. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch, TEZ-1661-2.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278574#comment-14278574 ] Jeff Zhang commented on TEZ-1661: - asyncDelegateRequestThread in LocalTaskSchedulerService is not stopped when DAGAppMaster is shutdown in local mode (actually it also happens in non-local mode, but we will call system.exit when shutting tez am in non-local mode, so it would not hang in non-local mode). The tez-examples don't hang in local mode because we always call System.exit when the job is done as following. But it doesn't make sense to require user to always do that. Attach a patch for addressing this issue. [~sseth], [~jeagles] please help review. {code} int res = ToolRunner.run(new Configuration(), new WordCount(), args); System.exit(res); {code} LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278503#comment-14278503 ] Hadoop QA commented on TEZ-1661: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12692481/TEZ-1661-1.patch against master revision 61bb0f8. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:red}-1 findbugs{color}. The patch appears to introduce 260 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/31//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/31//artifact/patchprocess/newPatchFindbugsWarningstez-mapreduce.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/31//artifact/patchprocess/newPatchFindbugsWarningstez-dag.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/31//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-internals.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/31//artifact/patchprocess/newPatchFindbugsWarningstez-tests.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/31//artifact/patchprocess/newPatchFindbugsWarningstez-examples.html Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/31//console This message is automatically generated. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky Assignee: Jeff Zhang Attachments: TEZ-1661-1.patch LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14274956#comment-14274956 ] Jeff Zhang commented on TEZ-1661: - [~ozhurakousky] Can you still reproduce in master ? LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TEZ-1661) LocalTaskScheduler hangs when shutdown
[ https://issues.apache.org/jira/browse/TEZ-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275366#comment-14275366 ] Oleg Zhurakousky commented on TEZ-1661: --- Yeah, the issue appears to be in _org.apache.tez.client.LocalClient_ which has the following method: {code} @Override public void stop() { // LocalClients are shared between TezClient and DAGClients, which can cause stop / start / close // to be invoked multiple times. If modifying these methods - this should be factored in. } {code} Basically in *local* mode call to _TezClient.stop_ results in a call to the above method. This means _LocalTaskSchedulerService.stopService_ method is never called keeping _asyncDelegateRequestThread_ alive indefinitely. LocalTaskScheduler hangs when shutdown -- Key: TEZ-1661 URL: https://issues.apache.org/jira/browse/TEZ-1661 Project: Apache Tez Issue Type: Bug Affects Versions: 0.5.0 Environment: Local Mode Reporter: Oleg Zhurakousky LocalTaskScheduler hangs on 'take' from the 'taskRequestQueue ' when TezClient shuts down (e.g., TezClient.stop). Below is jstack output observed when running in Tez local mode: {code} Thread-53 prio=5 tid=0x7fc876d8f800 nid=0xac07 runnable [0x00011df9] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x0007b6ce60a0 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:535) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:310) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:304) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)