[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152048#comment-13152048 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Hdfs-0.23-Build #79 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/79/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) svn merge -c r1202744 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152037#comment-13152037 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Mapreduce-trunk #900 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/900/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152032#comment-13152032 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Mapreduce-0.23-Build #96 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/96/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) svn merge -c r1202744 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152025#comment-13152025 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Hdfs-trunk #866 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/866/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151283#comment-13151283 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1300 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1300/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151282#comment-13151282 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Mapreduce-0.23-Commit #186 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/186/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) svn merge -c r1202744 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151276#comment-13151276 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Hdfs-trunk-Commit #1350 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1350/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151277#comment-13151277 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Common-0.23-Commit #174 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/174/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) svn merge -c r1202744 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151274#comment-13151274 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Common-trunk-Commit #1276 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1276/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151273#comment-13151273 ] Hudson commented on MAPREDUCE-3355: --- Integrated in Hadoop-Hdfs-0.23-Commit #173 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/173/]) MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command timeouts correctly. (vinodkv) svn merge -c r1202744 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151269#comment-13151269 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-3355: Thanks for the update, Sid. I am fine with those changes. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150881#comment-13150881 ] Hadoop QA commented on MAPREDUCE-3355: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503813/MR3355.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1309//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1309//console This message is automatically generated. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150861#comment-13150861 ] Siddharth Seth commented on MAPREDUCE-3355: --- Thanks.. Looks good, except one more interrupt check required after timer.cancel() while handling the CLEANUP event. Updating the patch with this minor change. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150371#comment-13150371 ] Hadoop QA commented on MAPREDUCE-3355: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503729/MAPREDUCE-3355-2015.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1307//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1307//console This message is automatically generated. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147471#comment-13147471 ] Siddharth Seth commented on MAPREDUCE-3355: --- There's another extremely unlikely situation which could cause this. Canceling the timer doesn't affect the timer task if it's already started. An interrupt could come in anytime after the cancel - which could interrupt the TA_CONTAINER_CLEANED event or the ContainerLaunchedEvent. This would be a combination of startContainer finishing around when the timer expires + some very specific thread scheduling. Also if the start/stopContainer were to complete around the same time as when the timer kicks in. Possible fix would be to synchronize in the main task on the CommandTimer when we don't care about interrupts, and always synchronize the CommandTimer on itself. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147130#comment-13147130 ] Hadoop QA commented on MAPREDUCE-3355: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503093/MAPREDUCE-3355-2009.1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1283//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1283//console This message is automatically generated. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.1.txt, > MAPREDUCE-3355-2009.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147051#comment-13147051 ] Hadoop QA commented on MAPREDUCE-3355: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503035/MAPREDUCE-3355-2009.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1282//console This message is automatically generated. > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147001#comment-13147001 ] Karam Singh commented on MAPREDUCE-3355: After patch over MAPREDUCE-, Ran Sort twice and did not observe this issue anymore > AM scheduling hangs frequently with sort job on 350 nodes > - > > Key: MAPREDUCE-3355 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, mrv2 >Affects Versions: 0.23.0 >Reporter: Vinod Kumar Vavilapalli >Assignee: Vinod Kumar Vavilapalli >Priority: Blocker > Fix For: 0.23.1 > > Attachments: MAPREDUCE-3355-2009.txt > > > Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 > node cluster. Found this in AM logs: > {code} > Exception in thread "ContainerLauncher #60" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 4 more > Exception in thread "ContainerLauncher #53" > org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405) > at > org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.InterruptedException > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168) > ... 5 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira