[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152048#comment-13152048
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Hdfs-0.23-Build #79 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/79/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)
svn merge -c r1202744 --ignore-ancestry ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152037#comment-13152037
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Mapreduce-trunk #900 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/900/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152032#comment-13152032
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Mapreduce-0.23-Build #96 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/96/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)
svn merge -c r1202744 --ignore-ancestry ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-17 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152025#comment-13152025
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Hdfs-trunk #866 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/866/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151283#comment-13151283
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1300 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1300/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151282#comment-13151282
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #186 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/186/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)
svn merge -c r1202744 --ignore-ancestry ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151276#comment-13151276
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1350 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1350/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151277#comment-13151277
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Common-0.23-Commit #174 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/174/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)
svn merge -c r1202744 --ignore-ancestry ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151274#comment-13151274
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Common-trunk-Commit #1276 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1276/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202744
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151273#comment-13151273
 ] 

Hudson commented on MAPREDUCE-3355:
---

Integrated in Hadoop-Hdfs-0.23-Commit #173 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/173/])
MAPREDUCE-3355. Fixed MR AM's ContainerLauncher to handle node-command 
timeouts correctly. (vinodkv)
svn merge -c r1202744 --ignore-ancestry ../../trunk/

vinodkv : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1202745
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/TestContainerLauncher.java


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-16 Thread Vinod Kumar Vavilapalli (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151269#comment-13151269
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-3355:


Thanks for the update, Sid. I am fine with those changes.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-15 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150881#comment-13150881
 ] 

Hadoop QA commented on MAPREDUCE-3355:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12503813/MR3355.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1309//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1309//console

This message is automatically generated.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-15 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150861#comment-13150861
 ] 

Siddharth Seth commented on MAPREDUCE-3355:
---

Thanks.. Looks good, except one more interrupt check required after 
timer.cancel() while handling the CLEANUP event. Updating the patch with this 
minor change.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt, MR3355.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-15 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150371#comment-13150371
 ] 

Hadoop QA commented on MAPREDUCE-3355:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503729/MAPREDUCE-3355-2015.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1307//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1307//console

This message is automatically generated.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt, MAPREDUCE-3355-2015.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-09 Thread Siddharth Seth (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147471#comment-13147471
 ] 

Siddharth Seth commented on MAPREDUCE-3355:
---

There's another extremely unlikely situation which could cause this. 
Canceling the timer doesn't affect the timer task if it's already started. An 
interrupt could come in anytime after the cancel - which could interrupt the 
TA_CONTAINER_CLEANED event or the ContainerLaunchedEvent. This would be a 
combination of startContainer finishing around when the timer expires + some 
very specific thread scheduling. Also if the start/stopContainer were to 
complete around the same time as when the timer kicks in.
Possible fix would be to synchronize in the main task on the CommandTimer when 
we don't care about interrupts, and always synchronize the CommandTimer on 
itself.


> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-09 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147130#comment-13147130
 ] 

Hadoop QA commented on MAPREDUCE-3355:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503093/MAPREDUCE-3355-2009.1.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1283//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1283//console

This message is automatically generated.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.1.txt, 
> MAPREDUCE-3355-2009.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-09 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147051#comment-13147051
 ] 

Hadoop QA commented on MAPREDUCE-3355:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12503035/MAPREDUCE-3355-2009.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1282//console

This message is automatically generated.

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-09 Thread Karam Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147001#comment-13147001
 ] 

Karam Singh commented on MAPREDUCE-3355:


After patch over MAPREDUCE-, Ran Sort twice and did not observe this issue 
anymore

> AM scheduling hangs frequently with sort job on 350 nodes
> -
>
> Key: MAPREDUCE-3355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, mrv2
>Affects Versions: 0.23.0
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Blocker
> Fix For: 0.23.1
>
> Attachments: MAPREDUCE-3355-2009.txt
>
>
> Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
> node cluster. Found this in AM logs:
> {code}
> Exception in thread "ContainerLauncher #60" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 4 more
> Exception in thread "ContainerLauncher #53" 
> org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
> at 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.InterruptedException
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
> at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
> at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
> ... 5 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira