[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720600#comment-13720600 ] Tom White commented on MAPREDUCE-5367: -- I was looking at trunk. Doesn't this need fixing for trunk too? Local jobs all use same local working directory --- Key: MAPREDUCE-5367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5367-b1.patch This means that local jobs, even in different JVMs, can't run concurrently because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5279) mapreduce scheduling deadlock
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5279: -- Assignee: PengZhang mapreduce scheduling deadlock - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: PengZhang Assignee: PengZhang Fix For: trunk Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5279) mapreduce scheduling deadlock
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720632#comment-13720632 ] Tsuyoshi OZAWA commented on MAPREDUCE-5279: --- [~pengzhang], thank you for contributing! Can you rebase on current trunk please? mapreduce scheduling deadlock - Key: MAPREDUCE-5279 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, scheduler Affects Versions: 2.0.3-alpha Reporter: PengZhang Assignee: PengZhang Fix For: trunk Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't take into account virtual cores while scheduling reduce tasks. This may cause more reduce tasks to be scheduled because memory is enough. And on a small cluster, this will end with deadlock, all running containers are reduce tasks but map phase is not finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
Junping Du created MAPREDUCE-5421: - Summary: TestNonExistentJob is failed due to recent changes in YARN Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Attachment: MAPREDUCE-5421.patch Upload a quick patch to fix it. TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Target Version/s: 2.1.0-beta Status: Patch Available (was: Open) TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5422) [Umbrella] Fix invalid state transitions in MRAppMaster
Devaraj K created MAPREDUCE-5422: Summary: [Umbrella] Fix invalid state transitions in MRAppMaster Key: MAPREDUCE-5422 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5422 Project: Hadoop Map/Reduce Issue Type: Task Components: mr-am Affects Versions: 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K There are mutiple invalid state transitions for the state machines present in MRAppMaster. All these can be handled as part of this umbrell JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5400) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl
[ https://issues.apache.org/jira/browse/MAPREDUCE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5400: - Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-5422 MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl - Key: MAPREDUCE-5400 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5400 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: applicationmaster Affects Versions: 2.0.5-alpha Reporter: J.Andreina Assignee: Devaraj K Priority: Minor Attachments: MAPREDUCE-5400.patch Step 1: Install cluster with HDFS , MR Step 2: Execute a job Step 3: Issue a kill task attempt for which the task has got completed. Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 No GC_PROFILE is given. Defaults to medium. 13/07/15 14:46:32 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol 13/07/15 14:46:33 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. Killed task attempt_1373875322959_0032_m_00_0 Observation: === 1. task state has been transitioned from SUCCEEDED to SCHEDULED 2. For a Succeeded attempt , when client issues Kill , then the client is notified as killed for a succeeded attempt. 3. Launched second task_attempt which is succeeded and then killed later on client request. 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the state is succeeded Issue : = 1. Client has been notified that the atttempt is killed , but acutually the attempt is succeeded and the same is displayed in JHS UI. 2. At App master InvalidStateTransitonException is thrown . 3. At client side and JHS job has exited with state Finished/succeeded ,At RM side the state is Finished/Failed. AM Logs: 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_0 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1373875322959_0032_m_00_1 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866) at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
[ https://issues.apache.org/jira/browse/MAPREDUCE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5409: - Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-5422 MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl - Key: MAPREDUCE-5409 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409 Project: Hadoop Map/Reduce Issue Type: Sub-task Affects Versions: 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K {code:xml} 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1374560536158_0003_m_40_0 is : 0.0 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising fetch failure to map 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle this event at current state for attempt_1374560536158_0003_m_07_0 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032) at org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123) at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) at java.lang.Thread.run(Thread.java:662) 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1374560536158_0003Job Transitioned from RUNNING to ERROR 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1374560536158_0003_m_40_0 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720715#comment-13720715 ] Hadoop QA commented on MAPREDUCE-5421: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594364/MAPREDUCE-5421.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials org.apache.hadoop.mapreduce.v2.TestNonExistentJob {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//console This message is automatically generated. TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720740#comment-13720740 ] Radim Kolar commented on MAPREDUCE-5153: its very simple to implement. If you want to push things forward then do it. Support for running combiners without reducers -- Key: MAPREDUCE-5153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Radim Kolar scenario: Workflow mapper - sort - combiner - hdfs No api change is need, if user set combiner class and reducers = 0 then run combiner and sent output to HDFS. Popular libraries such as scalding and cascading are offering this functionality, but they use caching entire mapper output in memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Attachment: MAPREDUCE-5421-v2.patch The ApplicationNotFoundException in server side should be translated to IOException in client side. Update to v2 patch to fix it. The left 2 failure is unrelated as it also appears in other jenkins job (like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845/testReport/) TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720832#comment-13720832 ] Hadoop QA commented on MAPREDUCE-5421: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594396/MAPREDUCE-5421-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//console This message is automatically generated. TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5251: -- Attachment: MAPREDUCE-5251-7-b23.txt Thanks a lot Jason. I've attached the patch for 23. Reducer should not implicate map attempt if it has insufficient space to fetch map output - Key: MAPREDUCE-5251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Assignee: Ashwin Shankar Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt A job can fail if a reducer happens to run on a node with insufficient space to hold a map attempt's output. The reducer keeps reporting the map attempt as bad, and if the map attempt ends up being re-launched too many times before the reducer decides maybe it is the real problem the job can fail. In that scenario it would be better to re-launch the reduce attempt and hopefully it will run on another node that has sufficient space to complete the shuffle. Reporting the map attempt is bad and relaunching the map task doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720897#comment-13720897 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594411/MAPREDUCE-5251-7-b23.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3907//console This message is automatically generated. Reducer should not implicate map attempt if it has insufficient space to fetch map output - Key: MAPREDUCE-5251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Assignee: Ashwin Shankar Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt A job can fail if a reducer happens to run on a node with insufficient space to hold a map attempt's output. The reducer keeps reporting the map attempt as bad, and if the map attempt ends up being re-launched too many times before the reducer decides maybe it is the real problem the job can fail. In that scenario it would be better to re-launch the reduce attempt and hopefully it will run on another node that has sufficient space to complete the shuffle. Reporting the map attempt is bad and relaunching the map task doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720906#comment-13720906 ] Robert Parker commented on MAPREDUCE-5419: -- The test failures have been identified as defects by other tickets: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile YARN-885,YARN-960 org.apache.hadoop.mapreduce.security.TestMRCredentials YARN-960 org.apache.hadoop.mapreduce.v2.TestNonExistentJob MAPREDUCE-5421 TestSlive is getting FileNotFound Exception --- Key: MAPREDUCE-5419 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: trunk, 2.1.0-beta, 0.23.9 Reporter: Robert Parker Assignee: Robert Parker Attachments: MAPREDUCE-5419.patch The write directory slive is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720939#comment-13720939 ] Sandy Ryza commented on MAPREDUCE-5367: --- I don't think the problem exists in trunk. getLocalTaskDir includes the job ID in the path, so there shouldn't be collisions. The other place that localRunner/ is used is for writing the job conf, which includes the job ID in its name. So that also should not be a problem. Though thinking about it now, it might make sense to change it as well for consistency? Local jobs all use same local working directory --- Key: MAPREDUCE-5367 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Sandy Ryza Assignee: Sandy Ryza Attachments: MAPREDUCE-5367-b1.patch This means that local jobs, even in different JVMs, can't run concurrently because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
Chu Tong created MAPREDUCE-5423: --- Summary: Rare deadlock situation when reducers try to fetch map output Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Chu Tong During our cluster deployment, we found there is deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below: 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-5423: Description: During our cluster deployment, we found there is deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:42,190
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-5423: Description: During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18
[jira] [Updated] (MAPREDUCE-5386) Ability to refresh history server job retention and job cleaner settings
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5386: -- Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Ashwin! I committed this to trunk and branch-2. Ability to refresh history server job retention and job cleaner settings Key: MAPREDUCE-5386 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Affects Versions: 2.1.0-beta Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: features Fix For: 3.0.0, 2.3.0 Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt We want to be able to refresh following job retention parameters without having to bounce the history server : 1. Job retention time - mapreduce.jobhistory.max-age-ms 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720977#comment-13720977 ] Mithun Radhakrishnan commented on MAPREDUCE-5402: - Gentlemen, I'm afraid I'll have to review this next week. (I'm swamped.) The main reason we tried to limit the maximum number of chunks on the DFS is because these are extremely small files (holding only target-file names/locations). Plus, they're likely to be short-lived. Increasing the number of these will increase NameNode pressure (short-lived file-objects). 400 was a good target for us at Yahoo, per DistCp job. I agree that keeping this configurable would be best. But then the responsibility of being polite to the name-node will transfer to the user. DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE -- Key: MAPREDUCE-5402 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distcp, mrv2 Reporter: David Rosenstrauch Assignee: Tsuyoshi OZAWA Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, MAPREDUCE-5402.3.patch In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author describes the implementation of DynamicInputFormat, with one of the main motivations cited being to reduce the chance of long-tails where a few leftover mappers run much longer than the rest. However, I today ran into a situation where I experienced exactly such a long tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate the problem by overriding the number of mappers and the split ratio used by the DynamicInputFormat, I was prevented from doing so by the hard-coded limit set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) This constant is actually set quite low for production use. (See a description of my use case below.) And although MAPREDUCE-2765 states that this is an overridable maximum, when reading through the code there does not actually appear to be any mechanism available to override it. This should be changed. It should be possible to expand the maximum # of chunks beyond this arbitrary limit. For example, here is the situation I ran into today: I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode the number of mappers for the job from the default of 20 to 128, so as to more properly parallelize the copy across the cluster. The number of chunk files created was calculated as 241, and mapred.num.entries.per.chunk was calculated as 12. As the job ran on, it reached a point where there were only 4 remaining map tasks, which had each been running for over 2 hours. The reason for this was that each of the 12 files that those mappers were copying were quite large (several hundred megabytes in size) and took ~20 minutes each. However, during this time, all the other 124 mappers sat idle. In theory I should be able to alleviate this problem with DynamicInputFormat. If I were able to, say, quadruple the number of chunk files created, that would have made each chunk contain only 3 files, and these large files would have gotten distributed better around the cluster and copied in parallel. However, when I tried to do that - by overriding mapred.listing.split.ratio to, say, 10 - DynamicInputFormat responded with an exception (Too many chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease split-ratio to proceed.) - presumably because I exceeded the MAX_CHUNKS_TOLERABLE value of 400. Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I can't personally see any. If this limit has no particular logic behind it, then it should be overridable - or even better: removed altogether. After all, I'm not sure I see any need for it. Even if numMaps * splitRatio resulted in an extraordinarily large number, if the code were modified so that the number of chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario where the product of numMaps and splitRatio is large, capping the number of chunks at the number of files (numberOfChunks = numberOfFiles) would result in 1 file per chunk - the maximum parallelization possible. That may not be the best-tuned solution for some users, but I would think that it should be left up to the user to deal with the potential consequence of not having tuned their job properly. Certainly that would be better than having an arbitrary hard-coded limit that *prevents*
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720976#comment-13720976 ] Jason Lowe commented on MAPREDUCE-5423: --- On which version of Hadoop did this occur? Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720979#comment-13720979 ] Chu Tong commented on MAPREDUCE-5423: - This is on 2.0.2-alpha Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5423: -- Component/s: mrv2 Affects Version/s: 2.0.2-alpha Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720982#comment-13720982 ] Jason Lowe commented on MAPREDUCE-5423: --- This may be a duplicate of MAPREDUCE-4842. Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:
[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-1981: -- Summary: Improve getSplits performance by using listLocatedStatus (was: Improve getSplits performance by using listFiles, the new FileSystem API) Hadoop Flags: Reviewed Thanks for the reviews, Kihwal. Committing this. Improve getSplits performance by using listLocatedStatus Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Attachments: mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721014#comment-13721014 ] Ravi Prakash commented on MAPREDUCE-5419: - Patch looks good to me. +1. Thanks Rob! TestSlive is getting FileNotFound Exception --- Key: MAPREDUCE-5419 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: trunk, 2.1.0-beta, 0.23.9 Reporter: Robert Parker Assignee: Robert Parker Attachments: MAPREDUCE-5419.patch The write directory slive is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5251: -- Resolution: Fixed Fix Version/s: 0.23.10 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 to the branch-0.23 patch and committed to branch-0.23. Reducer should not implicate map attempt if it has insufficient space to fetch map output - Key: MAPREDUCE-5251 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.7, 2.0.4-alpha Reporter: Jason Lowe Assignee: Ashwin Shankar Fix For: 3.0.0, 2.3.0, 0.23.10 Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt A job can fail if a reducer happens to run on a node with insufficient space to hold a map attempt's output. The reducer keeps reporting the map attempt as bad, and if the map attempt ends up being re-launched too many times before the reducer decides maybe it is the real problem the job can fail. In that scenario it would be better to re-launch the reduce attempt and hopefully it will run on another node that has sufficient space to complete the shuffle. Reporting the map attempt is bad and relaunching the map task doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-1981: -- Resolution: Fixed Fix Version/s: 0.23.10 2.3.0 3.0.0 Status: Resolved (was: Patch Available) Thanks Hairong, and thanks to everyone that contributed to reviews of various versions of the patch. I committed this to trunk, branch-2, and branch-0.23. Improve getSplits performance by using listLocatedStatus Key: MAPREDUCE-1981 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 Project: Hadoop Map/Reduce Issue Type: Improvement Components: job submission Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 3.0.0, 2.3.0, 0.23.10 Attachments: mapredListFiles1.patch, mapredListFiles2.patch, mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch This jira will make FileInputFormat and CombinedFileInputForm to use the new API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721044#comment-13721044 ] Xuan Gong commented on MAPREDUCE-5421: -- +1 Looks good TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
Vinod Kumar Vavilapalli created MAPREDUCE-5424: -- Summary: TestNonExistentJob failing after YARN-873 Key: MAPREDUCE-5424 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
[ https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721072#comment-13721072 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5424: It fails with the following: {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.573 sec FAILURE! testGetInvalidJob(org.apache.hadoop.mapreduce.v2.TestNonExistentJob) Time elapsed: 53420 sec ERROR! java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_0_' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:241) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:202) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2047) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2043) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2041) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:387) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522) at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:182) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:573) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493) at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:573) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:591) at org.apache.hadoop.mapreduce.v2.TestNonExistentJob.testGetInvalidJob(TestNonExistentJob.java:99) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) {code} TestNonExistentJob failing after YARN-873 - Key: MAPREDUCE-5424 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
[ https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved MAPREDUCE-5424. -- Resolution: Duplicate is duplicated as MAPREDUCE-5421 TestNonExistentJob failing after YARN-873 - Key: MAPREDUCE-5424 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721097#comment-13721097 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5421: +1. Checking this in.. TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5421: --- Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk, branch-2 and branch-2.1. Thanks Junping! TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du Fix For: 2.1.0-beta Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5421: --- Component/s: test Priority: Blocker (was: Major) TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721112#comment-13721112 ] Jason Lowe commented on MAPREDUCE-5419: --- +1, looks good to me as well. I'll commit this shortly. Note that initially I could not reproduce this problem, but it is very reproducible by cleaning and only running the TestSlive#testDataWriting test. It's easier to reproduce with JDK7 when running all of the TestSlive tests since that does not run the unit tests in a deterministic order. TestSlive is getting FileNotFound Exception --- Key: MAPREDUCE-5419 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: trunk, 2.1.0-beta, 0.23.9 Reporter: Robert Parker Assignee: Robert Parker Attachments: MAPREDUCE-5419.patch The write directory slive is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Attachment: LOADED_JOB_CACHE_MR5411-2.txt Refresh size of loaded job cache on history server -- Key: MAPREDUCE-5411 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Affects Versions: 2.1.0-beta Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: features Attachments: LOADED_JOB_CACHE_MR5411-1.txt, LOADED_JOB_CACHE_MR5411-2.txt We want to be able to refresh size of the loaded job cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721207#comment-13721207 ] Chu Tong commented on MAPREDUCE-5423: - I think you are right. I took a look at MAPREDUCE-4842 and I believe this is the issue I experienced. Can you please close this as a duplicate? Thanks Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18
[jira] [Created] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7
Ashwin Shankar created MAPREDUCE-5425: - Summary: Junit in TestJobHistoryServer failing in jdk 7 Key: MAPREDUCE-5425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.0.4-alpha Reporter: Ashwin Shankar We get the following exception when we run the unit tests of TestJobHistoryServer with jdk 7: Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719) at org.apache.hadoop.ipc.Server.bind(Server.java:423) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:535) at org.apache.hadoop.ipc.Server.init(Server.java:2202) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:901) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:505) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746) at org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) This is happening because testMainMethod starts the history server and doesnt stop it. This worked in jdk 6 because tests executed sequentially and this test was last one and didnt affect other tests,but in jdk 7 it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Status: Patch Available (was: Open) Thanks,patch refreshed.. Refresh size of loaded job cache on history server -- Key: MAPREDUCE-5411 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Affects Versions: 2.1.0-beta Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: features Attachments: LOADED_JOB_CACHE_MR5411-1.txt, LOADED_JOB_CACHE_MR5411-2.txt We want to be able to refresh size of the loaded job cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721256#comment-13721256 ] Hadoop QA commented on MAPREDUCE-5411: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594445/LOADED_JOB_CACHE_MR5411-2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//console This message is automatically generated. Refresh size of loaded job cache on history server -- Key: MAPREDUCE-5411 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: jobhistoryserver Affects Versions: 2.1.0-beta Reporter: Ashwin Shankar Assignee: Ashwin Shankar Labels: features Attachments: LOADED_JOB_CACHE_MR5411-1.txt, LOADED_JOB_CACHE_MR5411-2.txt We want to be able to refresh size of the loaded job cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved MAPREDUCE-5423. --- Resolution: Duplicate Rare deadlock situation when reducers try to fetch map output - Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.0.2-alpha Reporter: Chu Tong During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 0, usedMemory -gt;27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory -gt; 27, usedMemory -gt;55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory -gt; 55841309, usedMemory -gt;173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher
[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5419: -- Resolution: Fixed Fix Version/s: 0.23.10 2.1.0-beta 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Rob! I committed this to trunk, branch-2, branch-2.1-beta, and branch-0.23. TestSlive is getting FileNotFound Exception --- Key: MAPREDUCE-5419 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: trunk, 2.1.0-beta, 0.23.9 Reporter: Robert Parker Assignee: Robert Parker Fix For: 3.0.0, 2.1.0-beta, 0.23.10 Attachments: MAPREDUCE-5419.patch The write directory slive is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721341#comment-13721341 ] Junping Du commented on MAPREDUCE-5421: --- Thanks Vinod and Xuan for review! TestNonExistentJob is failed due to recent changes in YARN -- Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Junping Du Assignee: Junping Du Priority: Blocker Fix For: 2.1.0-beta Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces
[ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-4366. --- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Thanks Sandy. Committed to branch-1. mapred metrics shows negative count of waiting maps and reduces --- Key: MAPREDUCE-4366 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 1.0.2 Reporter: Thomas Graves Assignee: Sandy Ryza Fix For: 1.3.0 Attachments: MAPREDUCE-4366-branch-1-1.patch, MAPREDUCE-4366-branch-1.patch Negative waiting_maps and waiting_reduces count is observed in the mapred metrics. MAPREDUCE-1238 partially fixed this but it appears there is still issues as we are seeing it, but not as bad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira