[jira] [Resolved] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces

2013-07-26 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur resolved MAPREDUCE-4366.
---

   Resolution: Fixed
Fix Version/s: 1.3.0
 Hadoop Flags: Reviewed

Thanks Sandy. Committed to branch-1.

> mapred metrics shows negative count of waiting maps and reduces
> ---
>
> Key: MAPREDUCE-4366
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 1.0.2
>Reporter: Thomas Graves
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
> Attachments: MAPREDUCE-4366-branch-1-1.patch, 
> MAPREDUCE-4366-branch-1.patch
>
>
> Negative waiting_maps and waiting_reduces count is observed in the mapred 
> metrics.  MAPREDUCE-1238 partially fixed this but it appears there is still 
> issues as we are seeing it, but not as bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721341#comment-13721341
 ] 

Junping Du commented on MAPREDUCE-5421:
---

Thanks Vinod and Xuan for review!

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5419:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.1.0-beta
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Rob!  I committed this to trunk, branch-2, branch-2.1-beta, and 
branch-0.23.

> TestSlive is getting FileNotFound Exception
> ---
>
> Key: MAPREDUCE-5419
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: trunk, 2.1.0-beta, 0.23.9
>Reporter: Robert Parker
>Assignee: Robert Parker
> Fix For: 3.0.0, 2.1.0-beta, 0.23.10
>
> Attachments: MAPREDUCE-5419.patch
>
>
> The write directory "slive" is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe resolved MAPREDUCE-5423.
---

Resolution: Duplicate

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413

[jira] [Commented] (MAPREDUCE-5411) Refresh size of loaded job cache on history server

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721256#comment-13721256
 ] 

Hadoop QA commented on MAPREDUCE-5411:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594445/LOADED_JOB_CACHE_MR5411-2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//console

This message is automatically generated.

> Refresh size of loaded job cache on history server
> --
>
> Key: MAPREDUCE-5411
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: jobhistoryserver
>Affects Versions: 2.1.0-beta
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: features
> Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
> LOADED_JOB_CACHE_MR5411-2.txt
>
>
> We want to be able to refresh size of the loaded job 
> cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
> through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server

2013-07-26 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5411:
--

Status: Patch Available  (was: Open)

Thanks,patch refreshed..

> Refresh size of loaded job cache on history server
> --
>
> Key: MAPREDUCE-5411
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: jobhistoryserver
>Affects Versions: 2.1.0-beta
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: features
> Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
> LOADED_JOB_CACHE_MR5411-2.txt
>
>
> We want to be able to refresh size of the loaded job 
> cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
> through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7

2013-07-26 Thread Ashwin Shankar (JIRA)
Ashwin Shankar created MAPREDUCE-5425:
-

 Summary: Junit in TestJobHistoryServer failing in jdk 7
 Key: MAPREDUCE-5425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.4-alpha
Reporter: Ashwin Shankar


We get the following exception when we run the unit tests of 
TestJobHistoryServer with jdk 7:
Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] 
java.net.BindException: Address already in use; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719)
at org.apache.hadoop.ipc.Server.bind(Server.java:423)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:535)
at org.apache.hadoop.ipc.Server.(Server.java:2202)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:901)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:505)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480)
at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746)
at 
org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)


This is happening because testMainMethod starts the history server and doesnt 
stop it. This worked in jdk 6 because tests executed sequentially and this test 
was last one and didnt affect other tests,but in jdk 7 it fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Chu Tong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721207#comment-13721207
 ] 

Chu Tong commented on MAPREDUCE-5423:
-

I think you are right. I took a look at MAPREDUCE-4842 and I believe this is 
the issue I experienced. Can you please close this as a duplicate? Thanks

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, 

[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server

2013-07-26 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5411:
--

Attachment: LOADED_JOB_CACHE_MR5411-2.txt

> Refresh size of loaded job cache on history server
> --
>
> Key: MAPREDUCE-5411
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: jobhistoryserver
>Affects Versions: 2.1.0-beta
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: features
> Attachments: LOADED_JOB_CACHE_MR5411-1.txt, 
> LOADED_JOB_CACHE_MR5411-2.txt
>
>
> We want to be able to refresh size of the loaded job 
> cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server
> through history server's admin interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721112#comment-13721112
 ] 

Jason Lowe commented on MAPREDUCE-5419:
---

+1, looks good to me as well.  I'll commit this shortly.

Note that initially I could not reproduce this problem, but it is very 
reproducible by cleaning and only running the TestSlive#testDataWriting test.  
It's easier to reproduce with JDK7 when running all of the TestSlive tests 
since that does not run the unit tests in a deterministic order.

> TestSlive is getting FileNotFound Exception
> ---
>
> Key: MAPREDUCE-5419
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: trunk, 2.1.0-beta, 0.23.9
>Reporter: Robert Parker
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5419.patch
>
>
> The write directory "slive" is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5421:
---

   Resolution: Fixed
Fix Version/s: 2.1.0-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.1. Thanks Junping!

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Fix For: 2.1.0-beta
>
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5421:
---

Component/s: test
   Priority: Blocker  (was: Major)

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Blocker
> Fix For: 2.1.0-beta
>
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721097#comment-13721097
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5421:


+1. Checking this in..

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873

2013-07-26 Thread Xuan Gong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuan Gong resolved MAPREDUCE-5424.
--

Resolution: Duplicate

is duplicated as MAPREDUCE-5421

> TestNonExistentJob failing after YARN-873
> -
>
> Key: MAPREDUCE-5424
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Xuan Gong
>Priority: Blocker
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721072#comment-13721072
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5424:


It fails with the following:
{code}
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.573 sec <<< 
FAILURE!
testGetInvalidJob(org.apache.hadoop.mapreduce.v2.TestNonExistentJob)  Time 
elapsed: 53420 sec  <<< ERROR!
java.io.IOException: 
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_0_' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:241)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:202)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2047)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2043)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2041)

at 
org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:387)
at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522)
at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:182)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:573)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493)
at 
org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:573)
at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:591)
at 
org.apache.hadoop.mapreduce.v2.TestNonExistentJob.testGetInvalidJob(TestNonExistentJob.java:99)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
{code}

> TestNonExistentJob failing after YARN-873
> -
>
> Key: MAPREDUCE-5424
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Xuan Gong
>Priority: Blocker
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873

2013-07-26 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created MAPREDUCE-5424:
--

 Summary: TestNonExistentJob failing after YARN-873
 Key: MAPREDUCE-5424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinod Kumar Vavilapalli
Assignee: Xuan Gong
Priority: Blocker




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Xuan Gong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721044#comment-13721044
 ] 

Xuan Gong commented on MAPREDUCE-5421:
--

+1 Looks good

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-1981:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.3.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Thanks Hairong, and thanks to everyone that contributed to reviews of various 
versions of the patch.  I committed this to trunk, branch-2, and branch-0.23.

> Improve getSplits performance by using listLocatedStatus
> 
>
> Key: MAPREDUCE-1981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 3.0.0, 2.3.0, 0.23.10
>
> Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
> mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
> mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new 
> API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5251:
--

   Resolution: Fixed
Fix Version/s: 0.23.10
   2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 to the branch-0.23 patch and committed to branch-0.23.

> Reducer should not implicate map attempt if it has insufficient space to 
> fetch map output
> -
>
> Key: MAPREDUCE-5251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Jason Lowe
>Assignee: Ashwin Shankar
> Fix For: 3.0.0, 2.3.0, 0.23.10
>
> Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
> MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, 
> MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt
>
>
> A job can fail if a reducer happens to run on a node with insufficient space 
> to hold a map attempt's output.  The reducer keeps reporting the map attempt 
> as bad, and if the map attempt ends up being re-launched too many times 
> before the reducer decides maybe it is the real problem the job can fail.
> In that scenario it would be better to re-launch the reduce attempt and 
> hopefully it will run on another node that has sufficient space to complete 
> the shuffle.  Reporting the map attempt is bad and relaunching the map task 
> doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721014#comment-13721014
 ] 

Ravi Prakash commented on MAPREDUCE-5419:
-

Patch looks good to me. +1. Thanks Rob!

> TestSlive is getting FileNotFound Exception
> ---
>
> Key: MAPREDUCE-5419
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: trunk, 2.1.0-beta, 0.23.9
>Reporter: Robert Parker
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5419.patch
>
>
> The write directory "slive" is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-1981:
--

 Summary: Improve getSplits performance by using listLocatedStatus  
(was: Improve getSplits performance by using listFiles, the new FileSystem API)
Hadoop Flags: Reviewed

Thanks for the reviews, Kihwal.  Committing this.

> Improve getSplits performance by using listLocatedStatus
> 
>
> Key: MAPREDUCE-1981
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: job submission
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Attachments: mapredListFiles1.patch, mapredListFiles2.patch, 
> mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, 
> mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch
>
>
> This jira will make FileInputFormat and CombinedFileInputForm to use the new 
> API, thus reducing the number of RPCs to HDFS NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720982#comment-13720982
 ] 

Jason Lowe commented on MAPREDUCE-5423:
---

This may be a duplicate of MAPREDUCE-4842.

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.red

[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5423:
--

  Component/s: mrv2
Affects Version/s: 2.0.2-alpha

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.0.2-alpha
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Chu Tong (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720979#comment-13720979
 ] 

Chu Tong commented on MAPREDUCE-5423:
-

This is on 2.0.2-alpha

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
>

[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720976#comment-13720976
 ] 

Jason Lowe commented on MAPREDUCE-5423:
---

On which version of Hadoop did this occur?

> Rare deadlock situation when reducers try to fetch map output
> -
>
> Key: MAPREDUCE-5423
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_17_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_16_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_15_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 free

[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE

2013-07-26 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720977#comment-13720977
 ] 

Mithun Radhakrishnan commented on MAPREDUCE-5402:
-

Gentlemen, I'm afraid I'll have to review this next week. (I'm swamped.)

The main reason we tried to limit the maximum number of chunks on the DFS is 
because these are extremely small files (holding only target-file 
names/locations). Plus, they're likely to be short-lived. Increasing the number 
of these will increase NameNode pressure (short-lived file-objects). 400 was a 
good target for us at Yahoo, per DistCp job.

I agree that keeping this configurable would be best. But then the 
responsibility of being polite to the name-node will transfer to the user.

> DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
> --
>
> Key: MAPREDUCE-5402
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: distcp, mrv2
>Reporter: David Rosenstrauch
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, 
> MAPREDUCE-5402.3.patch
>
>
> In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author 
> describes the implementation of DynamicInputFormat, with one of the main 
> motivations cited being to reduce the chance of long-tails where a few 
> leftover mappers run much longer than the rest.
> However, I today ran into a situation where I experienced exactly such a long 
> tail using DistCpV2 and DynamicInputFormat.  And when I tried to alleviate 
> the problem by overriding the number of mappers and the split ratio used by 
> the DynamicInputFormat, I was prevented from doing so by the hard-coded limit 
> set in the code by the MAX_CHUNKS_TOLERABLE constant.  (Currently set to 400.)
> This constant is actually set quite low for production use.  (See a 
> description of my use case below.)  And although MAPREDUCE-2765 states that 
> this is an "overridable maximum", when reading through the code there does 
> not actually appear to be any mechanism available to override it.
> This should be changed.  It should be possible to expand the maximum # of 
> chunks beyond this arbitrary limit.
> For example, here is the situation I ran into today:
> I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots.  
> The job consisted of copying ~2800 files from HDFS to Amazon S3.  I overrode 
> the number of mappers for the job from the default of 20 to 128, so as to 
> more properly parallelize the copy across the cluster.  The number of chunk 
> files created was calculated as 241, and mapred.num.entries.per.chunk was 
> calculated as 12.
> As the job ran on, it reached a point where there were only 4 remaining map 
> tasks, which had each been running for over 2 hours.  The reason for this was 
> that each of the 12 files that those mappers were copying were quite large 
> (several hundred megabytes in size) and took ~20 minutes each.  However, 
> during this time, all the other 124 mappers sat idle.
> In theory I should be able to alleviate this problem with DynamicInputFormat. 
>  If I were able to, say, quadruple the number of chunk files created, that 
> would have made each chunk contain only 3 files, and these large files would 
> have gotten distributed better around the cluster and copied in parallel.
> However, when I tried to do that - by overriding mapred.listing.split.ratio 
> to, say, 10 - DynamicInputFormat responded with an exception ("Too many 
> chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease 
> split-ratio to proceed.") - presumably because I exceeded the 
> MAX_CHUNKS_TOLERABLE value of 400.
> Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit?  I 
> can't personally see any.
> If this limit has no particular logic behind it, then it should be 
> overridable - or even better:  removed altogether.  After all, I'm not sure I 
> see any need for it.  Even if numMaps * splitRatio resulted in an 
> extraordinarily large number, if the code were modified so that the number of 
> chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then 
> there would be no need for MAX_CHUNKS_TOLERABLE.  In this worst-case scenario 
> where the product of numMaps and splitRatio is large, capping the number of 
> chunks at the number of files (numberOfChunks = numberOfFiles) would result 
> in 1 file per chunk - the maximum parallelization possible.  That may not be 
> the best-tuned solution for some users, but I would think that it should be 
> left up to the user to deal with the potential consequence of not having 
> tuned their job properly.  Certainly that would be be

[jira] [Updated] (MAPREDUCE-5386) Ability to refresh history server job retention and job cleaner settings

2013-07-26 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5386:
--

   Resolution: Fixed
Fix Version/s: 2.3.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks, Ashwin!  I committed this to trunk and branch-2.

> Ability to refresh history server job retention and job cleaner settings
> 
>
> Key: MAPREDUCE-5386
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: jobhistoryserver
>Affects Versions: 2.1.0-beta
>Reporter: Ashwin Shankar
>Assignee: Ashwin Shankar
>  Labels: features
> Fix For: 3.0.0, 2.3.0
>
> Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, 
> JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt
>
>
> We want to be able to refresh following job retention parameters
> without having to bounce the history server :
> 1. Job retention time - mapreduce.jobhistory.max-age-ms
> 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms
> 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Chu Tong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chu Tong updated MAPREDUCE-5423:


Description: 
During our cluster deployment, we found there is a very rare deadlock situation 
when reducers try to fetch map output. We had 5 fetchers and log snippet 
illustrates this problem is below (all fetchers went into a wait state after 
they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing 
memory):

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
0, usedMemory ->27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
-> 27, usedMemory ->55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory 
-> 55841309, usedMemory ->173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:42,190 INFO [fetcher#1] 
org.a

[jira] [Created] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Chu Tong (JIRA)
Chu Tong created MAPREDUCE-5423:
---

 Summary: Rare deadlock situation when reducers try to fetch map 
output
 Key: MAPREDUCE-5423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Chu Tong


During our cluster deployment, we found there is deadlock situation when 
reducers try to fetch map output. We had 5 fetchers and log snippet illustrates 
this problem is below:

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
0, usedMemory ->27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
-> 27, usedMemory ->55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory 
-> 55841309, usedMemory ->173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:42,190 INFO [fetcher#1] 

[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output

2013-07-26 Thread Chu Tong (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chu Tong updated MAPREDUCE-5423:


Description: 
During our cluster deployment, we found there is deadlock situation when 
reducers try to fetch map output. We had 5 fetchers and log snippet illustrates 
this problem is below (all fetchers went into a wait state after they can't 
acquire more RAM beyond the memoryLimit and no fetcher is releasing memory):

2013-07-18 04:32:28,135 INFO [main] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching 
Map Completion Events
2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:28,146 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:28,319 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0
 sent hash and receievd reply
2013-07-18 04:32:28,320 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output 
for attempt_1373902166027_0622_m_17_0
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
0, usedMemory ->27
2013-07-18 04:32:28,325 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:33,158 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:33,161 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0
 sent hash and receievd reply
2013-07-18 04:32:33,200 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 
55841286 to MEMORY
2013-07-18 04:32:33,322 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
map-output for attempt_1373902166027_0622_m_16_0
2013-07-18 04:32:33,323 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
-> 27, usedMemory ->55841309
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
map-output for attempt_1373902166027_0622_m_15_0
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory 
-> 55841309, usedMemory ->173863446
2013-07-18 04:32:39,594 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
2013-07-18 04:32:42,188 INFO [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
101-09-04.sc1.verticloud.com:8080 to fetcher#1
2013-07-18 04:32:42,190 INFO [fetcher#1] 
org.apache.hadoop.

[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-07-26 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720939#comment-13720939
 ] 

Sandy Ryza commented on MAPREDUCE-5367:
---

I don't think the problem exists in trunk.  getLocalTaskDir includes the job ID 
in the path, so there shouldn't be collisions.  The other place that 
localRunner/ is used is for writing the job conf, which includes the job ID in 
its name.  So that also should not be a problem.  Though thinking about it now, 
it might make sense to change it as well for consistency?  

> Local jobs all use same local working directory
> ---
>
> Key: MAPREDUCE-5367
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5367-b1.patch
>
>
> This means that local jobs, even in different JVMs, can't run concurrently 
> because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception

2013-07-26 Thread Robert Parker (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720906#comment-13720906
 ] 

Robert Parker commented on MAPREDUCE-5419:
--

The test failures have been identified as defects by other tickets:

org.apache.hadoop.mapreduce.security.TestBinaryTokenFile YARN-885,YARN-960
org.apache.hadoop.mapreduce.security.TestMRCredentials YARN-960
org.apache.hadoop.mapreduce.v2.TestNonExistentJob MAPREDUCE-5421

> TestSlive is getting FileNotFound Exception
> ---
>
> Key: MAPREDUCE-5419
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: trunk, 2.1.0-beta, 0.23.9
>Reporter: Robert Parker
>Assignee: Robert Parker
> Attachments: MAPREDUCE-5419.patch
>
>
> The write directory "slive" is not getting created on the FS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720897#comment-13720897
 ] 

Hadoop QA commented on MAPREDUCE-5251:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594411/MAPREDUCE-5251-7-b23.txt
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3907//console

This message is automatically generated.

> Reducer should not implicate map attempt if it has insufficient space to 
> fetch map output
> -
>
> Key: MAPREDUCE-5251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Jason Lowe
>Assignee: Ashwin Shankar
> Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
> MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, 
> MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt
>
>
> A job can fail if a reducer happens to run on a node with insufficient space 
> to hold a map attempt's output.  The reducer keeps reporting the map attempt 
> as bad, and if the map attempt ends up being re-launched too many times 
> before the reducer decides maybe it is the real problem the job can fail.
> In that scenario it would be better to re-launch the reduce attempt and 
> hopefully it will run on another node that has sufficient space to complete 
> the shuffle.  Reporting the map attempt is bad and relaunching the map task 
> doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output

2013-07-26 Thread Ashwin Shankar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated MAPREDUCE-5251:
--

Attachment: MAPREDUCE-5251-7-b23.txt

Thanks a lot Jason. I've attached the patch for 23.

> Reducer should not implicate map attempt if it has insufficient space to 
> fetch map output
> -
>
> Key: MAPREDUCE-5251
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 0.23.7, 2.0.4-alpha
>Reporter: Jason Lowe
>Assignee: Ashwin Shankar
> Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, 
> MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, 
> MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt
>
>
> A job can fail if a reducer happens to run on a node with insufficient space 
> to hold a map attempt's output.  The reducer keeps reporting the map attempt 
> as bad, and if the map attempt ends up being re-launched too many times 
> before the reducer decides maybe it is the real problem the job can fail.
> In that scenario it would be better to re-launch the reduce attempt and 
> hopefully it will run on another node that has sufficient space to complete 
> the shuffle.  Reporting the map attempt is bad and relaunching the map task 
> doesn't change the fact that the reducer can't hold the output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720832#comment-13720832
 ] 

Hadoop QA commented on MAPREDUCE-5421:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12594396/MAPREDUCE-5421-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.security.TestMRCredentials

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//console

This message is automatically generated.

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Attachment: MAPREDUCE-5421-v2.patch

The ApplicationNotFoundException in server side should be translated to 
IOException in client side. Update to v2 patch to fix it. The left 2 failure is 
unrelated as it also appears in other jenkins job (like: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2845/testReport/)

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers

2013-07-26 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720740#comment-13720740
 ] 

Radim Kolar commented on MAPREDUCE-5153:


its very simple to implement. 

If you want to push things forward then do it.

> Support for running combiners without reducers
> --
>
> Key: MAPREDUCE-5153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>
> scenario: Workflow mapper -> sort -> combiner -> hdfs
> No api change is need, if user set combiner class and reducers = 0 then run 
> combiner and sent output to HDFS.
> Popular libraries such as scalding and cascading are offering this 
> functionality, but they use caching entire mapper output in memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720715#comment-13720715
 ] 

Hadoop QA commented on MAPREDUCE-5421:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12594364/MAPREDUCE-5421.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.security.TestBinaryTokenFile
  org.apache.hadoop.mapreduce.security.TestMRCredentials
  org.apache.hadoop.mapreduce.v2.TestNonExistentJob

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//console

This message is automatically generated.

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl

2013-07-26 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5409:
-

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-5422

> MRAppMaster throws InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
> -
>
> Key: MAPREDUCE-5409
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Affects Versions: 2.0.5-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> {code:xml}
> 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt 
> attempt_1374560536158_0003_m_40_0 is : 0.0
> 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures 
> for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising 
> fetch failure to map
> 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle 
> this event at current state for attempt_1374560536158_0003_m_07_0
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> TA_TOO_MANY_FETCH_FAILURE at KILLED
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>   at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
>   at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
>   at java.lang.Thread.run(Thread.java:662)
> 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: 
> job_1374560536158_0003Job Transitioned from RUNNING to ERROR
> 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] 
> org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from 
> attempt_1374560536158_0003_m_40_0
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5400) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl

2013-07-26 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-5400:
-

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-5422

> MRAppMaster throws InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED for JobImpl
> -
>
> Key: MAPREDUCE-5400
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5400
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster
>Affects Versions: 2.0.5-alpha
>Reporter: J.Andreina
>Assignee: Devaraj K
>Priority: Minor
> Attachments: MAPREDUCE-5400.patch
>
>
> Step 1: Install cluster with HDFS , MR
> Step 2: Execute a job
> Step 3: Issue a kill task attempt for which the task has got completed.
> Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin>
>  ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 
> No GC_PROFILE is given. Defaults to medium.
> 13/07/15 14:46:32 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.
> 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with 
> xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol
> 13/07/15 14:46:33 INFO service.AbstractService: 
> Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.
> Killed task attempt_1373875322959_0032_m_00_0
> Observation:
> ===
> 1. task state has been transitioned from SUCCEEDED to SCHEDULED
> 2. For a Succeeded attempt , when client issues Kill , then the client is 
> notified as killed for a succeeded attempt.
> 3. Launched second task_attempt which is succeeded and then killed later on 
> client request.
> 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the 
> state is succeeded
> Issue :
> =
> 1. Client has been notified that the atttempt is killed , but acutually the 
> attempt is succeeded and the same is displayed in JHS UI.
> 2. At App master InvalidStateTransitonException is thrown .
> 3. At client side and JHS job has exited with state Finished/succeeded ,At RM 
> side the state is Finished/Failed.
> AM Logs:
> 
> 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING 
> to SUCCEEDED
> 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
> attempt attempt_1373875322959_0032_m_00_0
> 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
> 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED
> 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with 
> attempt attempt_1373875322959_0032_m_00_1
> 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED
> 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_COMPLETED at SUCCEEDED
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77)
> at java.lang.Thread.run(Thread.java:662)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on J

[jira] [Created] (MAPREDUCE-5422) [Umbrella] Fix invalid state transitions in MRAppMaster

2013-07-26 Thread Devaraj K (JIRA)
Devaraj K created MAPREDUCE-5422:


 Summary: [Umbrella] Fix invalid state transitions in MRAppMaster
 Key: MAPREDUCE-5422
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5422
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: mr-am
Affects Versions: 2.0.5-alpha
Reporter: Devaraj K
Assignee: Devaraj K


There are mutiple invalid state transitions for the state machines present in 
MRAppMaster. All these can be handled as part of this umbrell JIRA.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Attachment: MAPREDUCE-5421.patch

Upload a quick patch to fix it.

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-5421:
--

Target Version/s: 2.1.0-beta
  Status: Patch Available  (was: Open)

> TestNonExistentJob is failed due to recent changes in YARN
> --
>
> Key: MAPREDUCE-5421
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Junping Du
>Assignee: Junping Du
> Attachments: MAPREDUCE-5421.patch
>
>
> After YARN-873, try to get an application report with unknown appID will get 
> a exception instead of null. This cause test failure in TestNonExistentJob 
> which affects other irrelevant jenkins jobs like: 
> https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We 
> need to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN

2013-07-26 Thread Junping Du (JIRA)
Junping Du created MAPREDUCE-5421:
-

 Summary: TestNonExistentJob is failed due to recent changes in YARN
 Key: MAPREDUCE-5421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du


After YARN-873, try to get an application report with unknown appID will get a 
exception instead of null. This cause test failure in TestNonExistentJob which 
affects other irrelevant jenkins jobs like: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need 
to fix test failure here.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5279) mapreduce scheduling deadlock

2013-07-26 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720632#comment-13720632
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5279:
---

[~pengzhang], thank you for contributing! Can you rebase on current trunk 
please?

> mapreduce scheduling deadlock
> -
>
> Key: MAPREDUCE-5279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: PengZhang
>Assignee: PengZhang
> Fix For: trunk
>
> Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch
>
>
> YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't 
> take into account virtual cores while scheduling reduce tasks.
> This may cause more reduce tasks to be scheduled because memory is enough. 
> And on a small cluster, this will end with deadlock, all running containers 
> are reduce tasks but map phase is not finished. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5279) mapreduce scheduling deadlock

2013-07-26 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5279:
--

Assignee: PengZhang

> mapreduce scheduling deadlock
> -
>
> Key: MAPREDUCE-5279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, scheduler
>Affects Versions: 2.0.3-alpha
>Reporter: PengZhang
>Assignee: PengZhang
> Fix For: trunk
>
> Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch
>
>
> YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't 
> take into account virtual cores while scheduling reduce tasks.
> This may cause more reduce tasks to be scheduled because memory is enough. 
> And on a small cluster, this will end with deadlock, all running containers 
> are reduce tasks but map phase is not finished. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory

2013-07-26 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720600#comment-13720600
 ] 

Tom White commented on MAPREDUCE-5367:
--

I was looking at trunk. Doesn't this need fixing for trunk too?

> Local jobs all use same local working directory
> ---
>
> Key: MAPREDUCE-5367
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: MAPREDUCE-5367-b1.patch
>
>
> This means that local jobs, even in different JVMs, can't run concurrently 
> because they might delete each other's files during work directory setup.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira