[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmed Radwan updated MAPREDUCE-3343: Attachment: MAPREDUCE-3343_rev2.patch Here is zhaoyunjiong's patch incorporating Eli's additional comments. TaskTracker Out of Memory because of distributed cache -- Key: MAPREDUCE-3343 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ahmed Radwan Assignee: zhaoyunjiong Labels: mapreduce, patch Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149009#comment-13149009 ] Hadoop QA commented on MAPREDUCE-3343: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12503477/MAPREDUCE-3343_rev2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1297//console This message is automatically generated. TaskTracker Out of Memory because of distributed cache -- Key: MAPREDUCE-3343 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.20.205.0 Reporter: Ahmed Radwan Assignee: zhaoyunjiong Labels: mapreduce, patch Attachments: MAPREDUCE-3343_rev2.patch, mapreduce-3343-release-0.20.205.0.patch This Out of Memory happens when you run large number of jobs (using the distributed cache) on a TaskTracker. Seems the basic issue is with the distributedCacheManager (instance of TrackerDistributedCacheManager in TaskTracker.java), this gets created during TaskTracker.initialize(), and it keeps references to TaskDistributedCacheManager for every submitted job via the jobArchives Map, also references to CacheStatus via cachedArchives map. I am not seeing these cleaned up between jobs, so this can out of memory problems after really large number of jobs are submitted. We have seen this issue in a number of cases. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149072#comment-13149072 ] Thomas Graves commented on MAPREDUCE-3393: -- yes JAVA_HOME is set. Did you run them individually or all the tests? Sorry I should have said this originally - they only fail when run them all together. I could not get them to fail when run individually. I'll attach logs. TestMRJobs, TestMROldApiJobs, and TestUberAM failures - Key: MAPREDUCE-3393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Hitesh Shah Attachments: MR-3393.1.patch, MR-3393.2.patch Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory --- T E S T S --- Running org.apache.hadoop.mapred.TestClientServiceDelegate Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec Running org.apache.hadoop.mapred.TestClientRedirect Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec Running org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestYARNRunner Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec FAILURE! Results : Failed tests: testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs) testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs) testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs) testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job expected to succeed failed testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs) Tests in error: testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0 org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 19, Failures: 5, Errors: 2, Skipped: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149075#comment-13149075 ] Thomas Graves commented on MAPREDUCE-3393: -- TestMROldApiJobs and TestUberAM both fail with exception below. So perhaps something isn't being shut down cleanly in a test before it or in the failure of TestMRJobs. If you can't reproduce it let me know and I'll look at the failures. 2011-11-11 17:20:44,452 ERROR [Thread-4] service.CompositeService (CompositeService.java:start(72)) - Error starting services ResourceManager org.apache.hadoop.yarn.YarnException: java.net.BindException: Problem binding to [0.0.0.0:8025] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:125) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:63) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.start(ResourceTrackerService.java:125) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.start(ResourceManager.java:439) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper$2.run(MiniYARNCluster.java:126) Caused by: java.net.BindException: Problem binding to [0.0.0.0:8025] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:606) at org.apache.hadoop.ipc.Server.bind(Server.java:230) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:310) at org.apache.hadoop.ipc.Server.init(Server.java:1591) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:576) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.init(ProtoOverHadoopRpcEngine.java:314) at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine.getServer(ProtoOverHadoopRpcEngine.java:390) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:550) at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:155) at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:118) ... 5 more TestMRJobs, TestMROldApiJobs, and TestUberAM failures - Key: MAPREDUCE-3393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Hitesh Shah Attachments: MR-3393.1.patch, MR-3393.2.patch Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory --- T E S T S --- Running org.apache.hadoop.mapred.TestClientServiceDelegate Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec Running org.apache.hadoop.mapred.TestClientRedirect Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec Running org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestYARNRunner Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec FAILURE! Results : Failed tests: testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs) testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs) testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs) testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job expected to succeed failed testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs) Tests in error: testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0 org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 19, Failures: 5, Errors: 2, Skipped: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (MAPREDUCE-3393) TestMRJobs, TestMROldApiJobs, and TestUberAM failures
[ https://issues.apache.org/jira/browse/MAPREDUCE-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-3393: - Attachment: org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt TestMRJobs, TestMROldApiJobs, and TestUberAM failures - Key: MAPREDUCE-3393 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3393 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Hitesh Shah Attachments: MR-3393.1.patch, MR-3393.2.patch, org.apache.hadoop.mapreduce.v2.TestMRJobs-output.txt Check out branch 0.23 and run mvn test from hadoop-mapreduce-project directory --- T E S T S --- Running org.apache.hadoop.mapred.TestClientServiceDelegate Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.717 sec Running org.apache.hadoop.mapred.TestClientRedirect Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.436 sec Running org.apache.hadoop.mapreduce.TestYarnClientProtocolProvider Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.975 sec Running org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 4, Failures: 3, Errors: 1, Skipped: 0, Time elapsed: 67.999 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestYARNRunner Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.976 sec Running org.apache.hadoop.mapreduce.v2.TestMROldApiJobs Tests run: 2, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 31.879 sec FAILURE! Running org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService ^NRunning org.apache.hadoop.mapreduce.v2.TestUberAM Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 101.096 sec FAILURE! Results : Failed tests: testSleepJob(org.apache.hadoop.mapreduce.v2.TestMRJobs) testRandomWriter(org.apache.hadoop.mapreduce.v2.TestMRJobs) testDistributedCache(org.apache.hadoop.mapreduce.v2.TestMRJobs) testJobSucceed(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs): Job expected to succeed failed testJobFail(org.apache.hadoop.mapreduce.v2.TestMROldApiJobs) Tests in error: testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs): 0 org.apache.hadoop.mapreduce.v2.TestUberAM: Failed to Start org.apache.hadoop.mapreduce.v2.TestMRJobs Tests run: 19, Failures: 5, Errors: 2, Skipped: 0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated MAPREDUCE-3395: --- Attachment: mapreduce-3395-2.patch Nit-corrected patch. Committing. Add mapred.disk.healthChecker.interval to mapred-default.xml Key: MAPREDUCE-3395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch Let's add mapred.disk.healthChecker.interval to mapred-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated MAPREDUCE-3395: --- Fix Version/s: 0.20.206.0 Committing to 0.20.206. Thanks Eli! Add mapred.disk.healthChecker.interval to mapred-default.xml Key: MAPREDUCE-3395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Fix For: 0.20.206.0 Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch Let's add mapred.disk.healthChecker.interval to mapred-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3395) Add mapred.disk.healthChecker.interval to mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved MAPREDUCE-3395. Resolution: Fixed Add mapred.disk.healthChecker.interval to mapred-default.xml Key: MAPREDUCE-3395 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3395 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Affects Versions: 0.20.205.0 Reporter: Eli Collins Assignee: Eli Collins Priority: Trivial Fix For: 0.20.206.0 Attachments: mapreduce-3395-1.patch, mapreduce-3395-2.patch Let's add mapred.disk.healthChecker.interval to mapred-default.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2863) Support web-services for RM NM
[ https://issues.apache.org/jira/browse/MAPREDUCE-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149178#comment-13149178 ] Thomas Graves commented on MAPREDUCE-2863: -- Hey Hitesh, thanks for the feedback. we could easily change the json to match or be closer. Right now its configured for POJO output format. We have a few options: http://jersey.java.net/nonav/apidocs/latest/jersey/com/sun/jersey/api/json/JSONConfiguration.Notation.html Any input on which one people prefer? The default is mapped when I turn off POJO and output is like: http://virt09-pv1.tgraves.pool.corp.sp2.yahoo.com:8088/ws/v1/cluster/apps { app : { finalStatus : UNDEFINED, finishedTime : 0, progress : 0.0, name : word count, startedTime : 1321112670525, amContainerLogs : http://host:/node/containerlogs/container_1321112633248_0001_01_01;, elapsedTime : 8681, note : , trackingUI : UNASSIGNED, state : ACCEPTED, appId : application_1321112633248_0001, trackingUrl : UNASSIGNED, user : tgraves, queue : default, clusterId : 1321112633248 } } ?xml version=1.0 encoding=UTF-8 standalone=yes? apps app appIdapplication_1321112633248_0001/appId usertgraves/user nameword count/name queuedefault/queue stateACCEPTED/state progress0.0/progress trackingUIUNASSIGNED/trackingUI trackingUrlUNASSIGNED/trackingUrlnote/ finalStatusUNDEFINED/finalStatus startedTime1321112670525/startedTime finishedTime0/finishedTimeelapsedTime8717/elapsedTime amContainerLogshttp://host:/node/containerlogs/container_1321112633248_0001_01_01/amContainerLogs clusterId1321112633248/clusterId /app /apps Support web-services for RM NM Key: MAPREDUCE-2863 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2863 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, nodemanager, resourcemanager Reporter: Arun C Murthy Assignee: Thomas Graves Attachments: MAPREDUCE-2863.patch, nmoutput.txt, rmoutput.txt It will be very useful for RM and NM to support web-services to export json/xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira