date:20130726

Junping Du created MAPREDUCE-5421:
-

 Summary: TestNonExistentJob is failed due to recent changes in YARN
 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du


After YARN-873, try to get an application report with unknown appID will get a 
exception instead of null. This cause test failure in TestNonExistentJob which 
affects other irrelevant jenkins jobs like: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need 
to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Attachment: MAPREDUCE-5421.patch

Upload a quick patch to fix it.

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Target Version/s: 2.1.0-beta
  Status: Patch Available  (was: Open)

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5422) [Umbrella] Fix invalid state transitions in MRAppMaster

2013-07-26 Thread Devaraj K (JIRA)

Devaraj K created MAPREDUCE-5422:


 Summary: [Umbrella] Fix invalid state transitions in MRAppMaster
 Key: MAPREDUCE-5422
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5422
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: mr-am
Affects Versions: 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K


There are mutiple invalid state transitions for the state machines present in 
MRAppMaster. All these can be handled as part of this umbrell JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5400) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl

2013-07-26 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5400:
-

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-5422

 MRAppMaster throws InvalidStateTransitonException: Invalid event: 
 JOB_TASK_COMPLETED at SUCCEEDED for JobImpl
 -

 Key: MAPREDUCE-5400
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5400
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: applicationmaster
Affects Versions: 2.0.5-alpha
Reporter: J.Andreina
Assignee: Devaraj K
Priority: Minor
 Attachments: MAPREDUCE-5400.patch


 Step 1: Install cluster with HDFS , MR
 Step 2: Execute a job
 Step 3: Issue a kill task attempt for which the task has got completed.
 Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin
  ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 
 No GC_PROFILE is given. Defaults to medium.
 13/07/15 14:46:32 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with 
 xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol
 13/07/15 14:46:33 INFO service.AbstractService: 
 Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
 Killed task attempt_1373875322959_0032_m_00_0
 Observation:
 ===
 1. task state has been transitioned from SUCCEEDED to SCHEDULED
 2. For a Succeeded attempt , when client issues Kill , then the client is 
 notified as killed for a succeeded attempt.
 3. Launched second task_attempt which is succeeded and then killed later on 
 client request.
 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the 
 state is succeeded
 Issue :
 =
 1. Client has been notified that the atttempt is killed , but acutually the 
 attempt is succeeded and the same is displayed in JHS UI.
 2. At App master InvalidStateTransitonException is thrown .
 3. At client side and JHS job has exited with state Finished/succeeded ,At RM 
 side the state is Finished/Failed.
 AM Logs:
 
 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
 attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING 
 to SUCCEEDED
 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attempt attempt_1373875322959_0032_m_00_0
 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED
 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
 attempt attempt_1373875322959_0032_m_00_1
 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
 task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
 at current state
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 JOB_TASK_COMPLETED at SUCCEEDED
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
 at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866)
 at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095)
 at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091)
 at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
 at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
 at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl

2013-07-26 Thread Devaraj K (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5409:
-

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-5422

 MRAppMaster throws InvalidStateTransitonException: Invalid event: 
 TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
 -

 Key: MAPREDUCE-5409
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K

 {code:xml}
 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
 attempt_1374560536158_0003_m_40_0 is : 0.0
 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
 for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising 
 fetch failure to map
 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
 this event at current state for attempt_1374560536158_0003_m_07_0
 org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
 TA_TOO_MANY_FETCH_FAILURE at KILLED
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
   at java.lang.Thread.run(Thread.java:662)
 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] 
 org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
 job_1374560536158_0003Job Transitioned from RUNNING to ERROR
 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from 
 attempt_1374560536158_0003_m_40_0
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720715#comment-13720715
 ] 

Hadoop QA commented on MAPREDUCE-5421:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594364/MAPREDUCE-5421.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.security.TestMRCredentials
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//console

This message is automatically generated.

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers

2013-07-26 Thread Radim Kolar (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720740#comment-13720740
 ] 

Radim Kolar commented on MAPREDUCE-5153:


its very simple to implement. 

If you want to push things forward then do it.

 Support for running combiners without reducers
 --

 Key: MAPREDUCE-5153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Radim Kolar

 scenario: Workflow mapper - sort - combiner - hdfs
 No api change is need, if user set combiner class and reducers = 0 then run 
 combiner and sent output to HDFS.
 Popular libraries such as scalding and cascading are offering this 
 functionality, but they use caching entire mapper output in memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Attachment: MAPREDUCE-5421-v2.patch

The ApplicationNotFoundException in server side should be translated to 
IOException in client side. Update to v2 patch to fix it. The left 2 failure is 
unrelated as it also appears in other jenkins job (like: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2845/testReport/)

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720832#comment-13720832
 ] 

Hadoop QA commented on MAPREDUCE-5421:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594396/MAPREDUCE-5421-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.security.TestMRCredentials

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//console

This message is automatically generated.

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5251:
--

Attachment: MAPREDUCE-5251-7-b23.txt

Thanks a lot Jason. I've attached the patch for 23.

 Reducer should not implicate map attempt if it has insufficient space to 
 fetch map output
 -

 Key: MAPREDUCE-5251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Jason Lowe
Assignee: Ashwin Shankar
 Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
 MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, 
 MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt


 A job can fail if a reducer happens to run on a node with insufficient space 
 to hold a map attempt's output.  The reducer keeps reporting the map attempt 
 as bad, and if the map attempt ends up being re-launched too many times 
 before the reducer decides maybe it is the real problem the job can fail.
 In that scenario it would be better to re-launch the reduce attempt and 
 hopefully it will run on another node that has sufficient space to complete 
 the shuffle.  Reporting the map attempt is bad and relaunching the map task 
 doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output

[
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720897#comment-13720897
]

Hadoop QA commented on MAPREDUCE-5251:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12594411/MAPREDUCE-5251-7-b23.txt
against trunk revision .

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3907//console

This message is automatically generated.

Reducer should not implicate map attempt if it has insufficient space to
fetch map output
-

Key: MAPREDUCE-5251
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Jason Lowe
Assignee: Ashwin Shankar
Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt,
MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt,
MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt

A job can fail if a reducer happens to run on a node with insufficient space
to hold a map attempt's output. The reducer keeps reporting the map attempt
as bad, and if the map attempt ends up being re-launched too many times
before the reducer decides maybe it is the real problem the job can fail.
In that scenario it would be better to re-launch the reduce attempt and
hopefully it will run on another node that has sufficient space to complete
the shuffle. Reporting the map attempt is bad and relaunching the map task
doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Robert Parker (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720906#comment-13720906
 ] 

Robert Parker commented on MAPREDUCE-5419:
--

The test failures have been identified as defects by other tickets:

org.apache.hadoop.mapreduce.security.TestBinaryTokenFile YARN-885,YARN-960
org.apache.hadoop.mapreduce.security.TestMRCredentials YARN-960
org.apache.hadoop.mapreduce.v2.TestNonExistentJob MAPREDUCE-5421

 TestSlive is getting FileNotFound Exception
 ---

 Key: MAPREDUCE-5419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: trunk, 2.1.0-beta, 0.23.9
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: MAPREDUCE-5419.patch


 The write directory slive is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-07-26 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720939#comment-13720939
 ] 

Sandy Ryza commented on MAPREDUCE-5367:
---

I don't think the problem exists in trunk.  getLocalTaskDir includes the job ID 
in the path, so there shouldn't be collisions.  The other place that 
localRunner/ is used is for writing the job conf, which includes the job ID in 
its name.  So that also should not be a problem.  Though thinking about it now, 
it might make sense to change it as well for consistency?  

 Local jobs all use same local working directory
 ---

 Key: MAPREDUCE-5367
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Sandy Ryza
Assignee: Sandy Ryza
 Attachments: MAPREDUCE-5367-b1.patch


 This means that local jobs, even in different JVMs, can't run concurrently 
 because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

Chu Tong created MAPREDUCE-5423:
---

 Summary: Rare deadlock situation when reducers try to fetch map 
output
 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chu Tong


During our cluster deployment, we found there is deadlock situation when 
reducers try to fetch map output. We had 5 fetchers and log snippet illustrates 
this problem is below:

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
0, usedMemory -gt;27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
-gt; 27, usedMemory -gt;55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory 
-gt; 55841309, usedMemory -gt;173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1

[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chu Tong updated MAPREDUCE-5423:


Description: 
During our cluster deployment, we found there is deadlock situation when 
reducers try to fetch map output. We had 5 fetchers and log snippet illustrates 
this problem is below (all fetchers went into a wait state after they can't 
acquire more RAM beyond the memoryLimit and no fetcher is releasing memory):

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
0, usedMemory -gt;27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
-gt; 27, usedMemory -gt;55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory 
-gt; 55841309, usedMemory -gt;173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:42,190

[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chu Tong updated MAPREDUCE-5423:


Description: 
During our cluster deployment, we found there is a very rare deadlock situation 
when reducers try to fetch map output. We had 5 fetchers and log snippet 
illustrates this problem is below (all fetchers went into a wait state after 
they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing 
memory):

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
0, usedMemory -gt;27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
-gt; 27, usedMemory -gt;55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, commitMemory 
-gt; 55841309, usedMemory -gt;173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18

[jira] [Updated] (MAPREDUCE-5386) Ability to refresh history server job retention and job cleaner settings


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5386:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Ashwin!  I committed this to trunk and branch-2.

 Ability to refresh history server job retention and job cleaner settings
 

 Key: MAPREDUCE-5386
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobhistoryserver
Affects Versions: 2.1.0-beta
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: features
 Fix For: 3.0.0, 2.3.0

 Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, 
 JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt


 We want to be able to refresh following job retention parameters
 without having to bounce the history server :
 1. Job retention time - mapreduce.jobhistory.max-age-ms
 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms
 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2013-07-26 Thread Mithun Radhakrishnan (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720977#comment-13720977
]

Mithun Radhakrishnan commented on MAPREDUCE-5402:
-

Gentlemen, I'm afraid I'll have to review this next week. (I'm swamped.)

The main reason we tried to limit the maximum number of chunks on the DFS is
because these are extremely small files (holding only target-file
names/locations). Plus, they're likely to be short-lived. Increasing the number
of these will increase NameNode pressure (short-lived file-objects). 400 was a
good target for us at Yahoo, per DistCp job.

I agree that keeping this configurable would be best. But then the
responsibility of being polite to the name-node will transfer to the user.

DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
--

Key: MAPREDUCE-5402
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: distcp, mrv2
Reporter: David Rosenstrauch
Assignee: Tsuyoshi OZAWA
Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch,
MAPREDUCE-5402.3.patch

In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author
describes the implementation of DynamicInputFormat, with one of the main
motivations cited being to reduce the chance of long-tails where a few
leftover mappers run much longer than the rest.
However, I today ran into a situation where I experienced exactly such a long
tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate
the problem by overriding the number of mappers and the split ratio used by
the DynamicInputFormat, I was prevented from doing so by the hard-coded limit
set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.)
This constant is actually set quite low for production use. (See a
description of my use case below.) And although MAPREDUCE-2765 states that
this is an overridable maximum, when reading through the code there does
not actually appear to be any mechanism available to override it.
This should be changed. It should be possible to expand the maximum # of
chunks beyond this arbitrary limit.
For example, here is the situation I ran into today:
I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.
The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode
the number of mappers for the job from the default of 20 to 128, so as to
more properly parallelize the copy across the cluster. The number of chunk
files created was calculated as 241, and mapred.num.entries.per.chunk was
calculated as 12.
As the job ran on, it reached a point where there were only 4 remaining map
tasks, which had each been running for over 2 hours. The reason for this was
that each of the 12 files that those mappers were copying were quite large
(several hundred megabytes in size) and took ~20 minutes each. However,
during this time, all the other 124 mappers sat idle.
In theory I should be able to alleviate this problem with DynamicInputFormat.
If I were able to, say, quadruple the number of chunk files created, that
would have made each chunk contain only 3 files, and these large files would
have gotten distributed better around the cluster and copied in parallel.
However, when I tried to do that - by overriding mapred.listing.split.ratio
to, say, 10 - DynamicInputFormat responded with an exception (Too many
chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease
split-ratio to proceed.) - presumably because I exceeded the
MAX_CHUNKS_TOLERABLE value of 400.
Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I
can't personally see any.
If this limit has no particular logic behind it, then it should be
overridable - or even better: removed altogether. After all, I'm not sure I
see any need for it. Even if numMaps * splitRatio resulted in an
extraordinarily large number, if the code were modified so that the number of
chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then
there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario
where the product of numMaps and splitRatio is large, capping the number of
chunks at the number of files (numberOfChunks = numberOfFiles) would result
in 1 file per chunk - the maximum parallelization possible. That may not be
the best-tuned solution for some users, but I would think that it should be
left up to the user to deal with the potential consequence of not having
tuned their job properly. Certainly that would be better than having an
arbitrary hard-coded limit that *prevents*

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720976#comment-13720976
 ] 

Jason Lowe commented on MAPREDUCE-5423:
---

On which version of Hadoop did this occur?

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
 2013-07-18 04:32:42,188

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720979#comment-13720979
 ] 

Chu Tong commented on MAPREDUCE-5423:
-

This is on 2.0.2-alpha

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
 2013-07-18 04:32:42,188 INFO [EventFetcher for

[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5423:
--

  Component/s: mrv2
Affects Version/s: 2.0.2-alpha

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
 2013-07-18

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13720982#comment-13720982
 ] 

Jason Lowe commented on MAPREDUCE-5423:
---

This may be a duplicate of MAPREDUCE-4842.

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler:

[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-1981:
--

 Summary: Improve getSplits performance by using listLocatedStatus  
(was: Improve getSplits performance by using listFiles, the new FileSystem API)
Hadoop Flags: Reviewed

Thanks for the reviews, Kihwal.  Committing this.

 Improve getSplits performance by using listLocatedStatus
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
 mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
 mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Ravi Prakash (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721014#comment-13721014
 ] 

Ravi Prakash commented on MAPREDUCE-5419:
-

Patch looks good to me. +1. Thanks Rob!

 TestSlive is getting FileNotFound Exception
 ---

 Key: MAPREDUCE-5419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: trunk, 2.1.0-beta, 0.23.9
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: MAPREDUCE-5419.patch


 The write directory slive is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5251:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 to the branch-0.23 patch and committed to branch-0.23.

 Reducer should not implicate map attempt if it has insufficient space to 
 fetch map output
 -

 Key: MAPREDUCE-5251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.7, 2.0.4-alpha
Reporter: Jason Lowe
Assignee: Ashwin Shankar
 Fix For: 3.0.0, 2.3.0, 0.23.10

 Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
 MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, 
 MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt


 A job can fail if a reducer happens to run on a node with insufficient space 
 to hold a map attempt's output.  The reducer keeps reporting the map attempt 
 as bad, and if the map attempt ends up being re-launched too many times 
 before the reducer decides maybe it is the real problem the job can fail.
 In that scenario it would be better to re-launch the reduce attempt and 
 hopefully it will run on another node that has sufficient space to complete 
 the shuffle.  Reporting the map attempt is bad and relaunching the map task 
 doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-1981:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.3.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Hairong, and thanks to everyone that contributed to reviews of various 
versions of the patch.  I committed this to trunk, branch-2, and branch-0.23.

 Improve getSplits performance by using listLocatedStatus
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 3.0.0, 2.3.0, 0.23.10

 Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
 mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
 mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Xuan Gong (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721044#comment-13721044
 ] 

Xuan Gong commented on MAPREDUCE-5421:
--

+1 Looks good

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873

Vinod Kumar Vavilapalli created MAPREDUCE-5424:
--

 Summary: TestNonExistentJob failing after YARN-873
 Key: MAPREDUCE-5424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Blocker




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721072#comment-13721072
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5424:


It fails with the following:
{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.573 sec  
FAILURE!
testGetInvalidJob(org.apache.hadoop.mapreduce.v2.TestNonExistentJob)  Time 
elapsed: 53420 sec   ERROR!
java.io.IOException: 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_0_' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:241)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:202)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2047)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2043)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2041)

at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:387)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:182)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493)
at 
org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:591)
at 
org.apache.hadoop.mapreduce.v2.TestNonExistentJob.testGetInvalidJob(TestNonExistentJob.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
{code}

 TestNonExistentJob failing after YARN-873
 -

 Key: MAPREDUCE-5424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Blocker



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873

2013-07-26 Thread Xuan Gong (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved MAPREDUCE-5424.
--

Resolution: Duplicate

is duplicated as MAPREDUCE-5421

 TestNonExistentJob failing after YARN-873
 -

 Key: MAPREDUCE-5424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Blocker



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721097#comment-13721097
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5421:


+1. Checking this in..

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5421:
---

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.1. Thanks Junping!

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5421:
---

Component/s: test
   Priority: Blocker  (was: Major)

 TestNonExistentJob is failed due to recent changes in YARN
 --

 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Junping Du
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.1.0-beta

 Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch


 After YARN-873, try to get an application report with unknown appID will get 
 a exception instead of null. This cause test failure in TestNonExistentJob 
 which affects other irrelevant jenkins jobs like: 
 https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
 need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721112#comment-13721112
 ] 

Jason Lowe commented on MAPREDUCE-5419:
---

+1, looks good to me as well.  I'll commit this shortly.

Note that initially I could not reproduce this problem, but it is very 
reproducible by cleaning and only running the TestSlive#testDataWriting test.  
It's easier to reproduce with JDK7 when running all of the TestSlive tests 
since that does not run the unit tests in a deterministic order.

 TestSlive is getting FileNotFound Exception
 ---

 Key: MAPREDUCE-5419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: trunk, 2.1.0-beta, 0.23.9
Reporter: Robert Parker
Assignee: Robert Parker
 Attachments: MAPREDUCE-5419.patch


 The write directory slive is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5411:
--

Attachment: LOADED_JOB_CACHE_MR5411-2.txt

 Refresh size of loaded job cache on history server
 --

 Key: MAPREDUCE-5411
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobhistoryserver
Affects Versions: 2.1.0-beta
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: features
 Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
 LOADED_JOB_CACHE_MR5411-2.txt


 We want to be able to refresh size of the loaded job 
 cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
 through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721207#comment-13721207
 ] 

Chu Tong commented on MAPREDUCE-5423:
-

I think you are right. I took a look at MAPREDUCE-4842 and I believe this is 
the issue I experienced. Can you please close this as a duplicate? Thanks

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18

[jira] [Created] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7

Ashwin Shankar created MAPREDUCE-5425:
-

 Summary: Junit in TestJobHistoryServer failing in jdk 7
 Key: MAPREDUCE-5425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.4-alpha
Reporter: Ashwin Shankar


We get the following exception when we run the unit tests of 
TestJobHistoryServer with jdk 7:
Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719)
at org.apache.hadoop.ipc.Server.bind(Server.java:423)
at org.apache.hadoop.ipc.Server$Listener.init(Server.java:535)
at org.apache.hadoop.ipc.Server.init(Server.java:2202)
at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:901)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.init(ProtobufRpcEngine.java:505)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746)
at 
org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)


This is happening because testMainMethod starts the history server and doesnt 
stop it. This worked in jdk 6 because tests executed sequentially and this test 
was last one and didnt affect other tests,but in jdk 7 it fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5411:
--

Status: Patch Available  (was: Open)

Thanks,patch refreshed..

 Refresh size of loaded job cache on history server
 --

 Key: MAPREDUCE-5411
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobhistoryserver
Affects Versions: 2.1.0-beta
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: features
 Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
 LOADED_JOB_CACHE_MR5411-2.txt


 We want to be able to refresh size of the loaded job 
 cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
 through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5411) Refresh size of loaded job cache on history server


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721256#comment-13721256
 ] 

Hadoop QA commented on MAPREDUCE-5411:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594445/LOADED_JOB_CACHE_MR5411-2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//console

This message is automatically generated.

 Refresh size of loaded job cache on history server
 --

 Key: MAPREDUCE-5411
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobhistoryserver
Affects Versions: 2.1.0-beta
Reporter: Ashwin Shankar
Assignee: Ashwin Shankar
  Labels: features
 Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
 LOADED_JOB_CACHE_MR5411-2.txt


 We want to be able to refresh size of the loaded job 
 cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
 through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved MAPREDUCE-5423.
---

Resolution: Duplicate

 Rare deadlock situation when reducers try to fetch map output
 -

 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.0.2-alpha
Reporter: Chu Tong

 During our cluster deployment, we found there is a very rare deadlock 
 situation when reducers try to fetch map output. We had 5 fetchers and log 
 snippet illustrates this problem is below (all fetchers went into a wait 
 state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
 is releasing memory):
 2013-07-18 04:32:28,135 INFO [main] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
 memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
 mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
 fetching Map Completion Events
 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:28,146 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:28,319 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_17_0
  sent hash and receievd reply
 2013-07-18 04:32:28,320 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
 MEMORY
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
 map-output for attempt_1373902166027_0622_m_17_0
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 27, inMemoryMapOutputs.size() -gt; 1, commitMemory -gt; 
 0, usedMemory -gt;27
 2013-07-18 04:32:28,325 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
 Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
 attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
 2013-07-18 04:32:33,158 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
 101-09-04.sc1.verticloud.com:8080 to fetcher#1
 2013-07-18 04:32:33,161 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
 url=8080/mapOutput?job=job_1373902166027_0622amp;reduce=1amp;map=attempt_1373902166027_0622_m_16_0
  sent hash and receievd reply
 2013-07-18 04:32:33,200 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
 output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
 55841286 to MEMORY
 2013-07-18 04:32:33,322 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
 map-output for attempt_1373902166027_0622_m_16_0
 2013-07-18 04:32:33,323 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 55841282, inMemoryMapOutputs.size() -gt; 2, commitMemory 
 -gt; 27, usedMemory -gt;55841309
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
 map-output for attempt_1373902166027_0622_m_15_0
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -gt; 
 map-output of size: 118022137, inMemoryMapOutputs.size() -gt; 3, 
 commitMemory -gt; 55841309, usedMemory -gt;173863446
 2013-07-18 04:32:39,594 INFO [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
 2013-07-18 04:32:42,188 INFO [EventFetcher

[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5419:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.1.0-beta
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Rob!  I committed this to trunk, branch-2, branch-2.1-beta, and 
branch-0.23.

 TestSlive is getting FileNotFound Exception
 ---

 Key: MAPREDUCE-5419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: trunk, 2.1.0-beta, 0.23.9
Reporter: Robert Parker
Assignee: Robert Parker
 Fix For: 3.0.0, 2.1.0-beta, 0.23.10

 Attachments: MAPREDUCE-5419.patch


 The write directory slive is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN