[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147646#comment-13147646 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Hdfs-0.23-Build #72 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/72/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) svn merge -c r1199751 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199757 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147651#comment-13147651 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Hdfs-trunk #859 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/859/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199751 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147673#comment-13147673 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Mapreduce-trunk #893 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/893/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199751 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146836#comment-13146836 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-: More digging up through the RPC layer and I figured I am running into HADOOP-7317 or the related HDFS-1965. We can use the same RPC client to connect to different servers, reuse connections to the same server but cannot terminate connections individually. Two options we have: - Modify ProtoOverHadoopRPCEngine to avoid caching of clients altogether depending on a configuration or - set the idle time for connections to zero. Either is manageable, effort-wise, I am doing the later as it is simpler. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146890#comment-13146890 ] Siddharth Seth commented on MAPREDUCE-: --- Went through the latest patch. Looks good mostly. - The close call shouldn't really be required with the idle time set to 0. - Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in all the service client impls? It's a PB specific factory, so putting it here should be ok. Otherwise - the Exception in stopClient() should not be ignored. - The client cache (removed by the patch) in ContainerLauncherImpl would still be useful in non-secure mode. This works for both though - so isn't high priority. Maybe a separate jira. Haven't tried the latest patch. Had tried the previous one on a single node. Jobs were running fine. The close call was getting to the ClientCache, but doing nothing due to refcount checks. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146896#comment-13146896 ] Siddharth Seth commented on MAPREDUCE-: --- Forgot to mention - nice clean workaround to the rpc stop not working :) Thought it'd be way more involved. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146998#comment-13146998 ] Karam Singh commented on MAPREDUCE-: After applying lastest patch, Ran Sort twice and did not observe this issue anymore MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147008#comment-13147008 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-: bq. The close call shouldn't really be required with the idle time set to 0. My idea was to actually remove the maxIdleTime setting once the root issue HADOOP-7317 is fixed. I'll let it be. bq. Should RPCClientFactoryPBImpl call RPC.stopProxy ? instead of putting it in all the service client impls? It's a PB specific factory, so putting it here should be ok. No, that isn't possible. We need access to the proxy object in each impl. Bane of multiple layering in this part of the code. bq.Otherwise - the Exception in stopClient() should not be ignored. Sure, I'll throw exception so that it is clear if somebody calles stopClient() for a protocol that doesn't implement it. bq. The client cache (removed by the patch) in ContainerLauncherImpl would still be useful in non-secure mode. This works for both though - so isn't high priority. Maybe a separate jira. Sure, but helps to have the same implementation. Separate JIRA if someone needs it. bq. Forgot to mention - nice clean workaround to the rpc stop not working Thought it'd be way more involved. Yeah, been running with this workaround since nearly a week but didn't put that in the patch in the hope of fixing the root cause. Turns out that is the only short term solution, alas. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147037#comment-13147037 ] Hadoop QA commented on MAPREDUCE-: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503073/MAPREDUCE--2009.1.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1281//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1281//console This message is automatically generated. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147046#comment-13147046 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Hdfs-trunk-Commit #1335 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1335/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199751 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147047#comment-13147047 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Common-trunk-Commit #1261 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1261/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199751 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147052#comment-13147052 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Common-0.23-Commit #161 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/161/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) svn merge -c r1199751 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199757 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147050#comment-13147050 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Hdfs-0.23-Commit #160 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/160/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) svn merge -c r1199751 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199757 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147059#comment-13147059 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1283 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1283/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199751 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147061#comment-13147061 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Mapreduce-0.23-Commit #172 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/172/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) svn merge -c r1199751 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199757 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13147322#comment-13147322 ] Hudson commented on MAPREDUCE-: --- Integrated in Hadoop-Mapreduce-0.23-Build #87 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/87/]) MAPREDUCE-. Fixed bugs in ContainerLauncher of MR AppMaster due to which per-container connections to NodeManager were lingering long enough to hit the ulimits on number of processes. (vinodkv) svn merge -c r1199751 --ignore-ancestry ../../trunk/ vinodkv : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1199757 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncher.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/impl/pb/client/ContainerManagerPBClientImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/RpcClientFactory.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/factories/impl/pb/RpcClientFactoryPBImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnProtoRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/HadoopYarnRPC.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/ProtoOverHadoopRpcEngine.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/ipc/YarnRPC.java MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt, MAPREDUCE--2009.1.txt, MAPREDUCE--2009.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146214#comment-13146214 ] Hadoop QA commented on MAPREDUCE-: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12502903/MAPREDUCE--2008.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1270//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1270//console This message is automatically generated. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Fix For: 0.23.1 Attachments: MAPREDUCE--2002.txt, MAPREDUCE--2008.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142170#comment-13142170 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-: Actually the code does look right, it creates only one thread per node. This is deeper than my first suspicion, still debugging. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3333) MR AM for sort-job going out of memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142234#comment-13142234 ] Hadoop QA commented on MAPREDUCE-: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12501972/MAPREDUCE--2002.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1240//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1240//console This message is automatically generated. MR AM for sort-job going out of memory -- Key: MAPREDUCE- URL: https://issues.apache.org/jira/browse/MAPREDUCE- Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2 Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli Priority: Blocker Attachments: MAPREDUCE--2002.txt [~Karams] just found this. The usual sort job on a 350 node cluster hung due to OutOfMemory and eventually failed after an hour instead of the usual odd 20 minutes. {code} 2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container launch failed for container_1320233407485_0002 _01_001434 : java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88) at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) at $Proxy20.startContainer(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81) ... 4 more Caused by: java.io.IOException: Failed on local exception: java.io.IOException: Couldn't set up IO streams; Host Details : local host is: gsbl91281.blue.ygrid.yahoo.com/98.137.101.189; destination host is: gsbl91525.blue.ygrid.yahoo.com:45450; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655) at org.apache.hadoop.ipc.Client.call(Client.java:1089) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) ... 6 more Caused by: java.io.IOException: Couldn't set up IO streams at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621) at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) at org.apache.hadoop.ipc.Client.call(Client.java:1065) ... 7 more Caused by: java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: