[jira] [Resolved] (MAPREDUCE-4366) mapred metrics shows negative count of waiting maps and reduces
[ https://issues.apache.org/jira/browse/MAPREDUCE-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-4366. --- Resolution: Fixed Fix Version/s: 1.3.0 Hadoop Flags: Reviewed Thanks Sandy. Committed to branch-1. > mapred metrics shows negative count of waiting maps and reduces > --- > > Key: MAPREDUCE-4366 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4366 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1.0.2 >Reporter: Thomas Graves >Assignee: Sandy Ryza > Fix For: 1.3.0 > > Attachments: MAPREDUCE-4366-branch-1-1.patch, > MAPREDUCE-4366-branch-1.patch > > > Negative waiting_maps and waiting_reduces count is observed in the mapred > metrics. MAPREDUCE-1238 partially fixed this but it appears there is still > issues as we are seeing it, but not as bad. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721341#comment-13721341 ] Junping Du commented on MAPREDUCE-5421: --- Thanks Vinod and Xuan for review! > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5419: -- Resolution: Fixed Fix Version/s: 0.23.10 2.1.0-beta 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Rob! I committed this to trunk, branch-2, branch-2.1-beta, and branch-0.23. > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Fix For: 3.0.0, 2.1.0-beta, 0.23.10 > > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe resolved MAPREDUCE-5423. --- Resolution: Duplicate > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309, usedMemory ->173863446 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413
[jira] [Commented] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721256#comment-13721256 ] Hadoop QA commented on MAPREDUCE-5411: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594445/LOADED_JOB_CACHE_MR5411-2.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3908//console This message is automatically generated. > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt, > LOADED_JOB_CACHE_MR5411-2.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Status: Patch Available (was: Open) Thanks,patch refreshed.. > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt, > LOADED_JOB_CACHE_MR5411-2.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5425) Junit in TestJobHistoryServer failing in jdk 7
Ashwin Shankar created MAPREDUCE-5425: - Summary: Junit in TestJobHistoryServer failing in jdk 7 Key: MAPREDUCE-5425 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5425 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver Affects Versions: 2.0.4-alpha Reporter: Ashwin Shankar We get the following exception when we run the unit tests of TestJobHistoryServer with jdk 7: Caused by: java.net.BindException: Problem binding to [0.0.0.0:10033] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719) at org.apache.hadoop.ipc.Server.bind(Server.java:423) at org.apache.hadoop.ipc.Server$Listener.(Server.java:535) at org.apache.hadoop.ipc.Server.(Server.java:2202) at org.apache.hadoop.ipc.RPC$Server.(RPC.java:901) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:505) at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480) at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:746) at org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer.serviceInit(HSAdminServer.java:100) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) This is happening because testMainMethod starts the history server and doesnt stop it. This worked in jdk 6 because tests executed sequentially and this test was last one and didnt affect other tests,but in jdk 7 it fails. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721207#comment-13721207 ] Chu Tong commented on MAPREDUCE-5423: - I think you are right. I took a look at MAPREDUCE-4842 and I believe this is the issue I experienced. Can you please close this as a duplicate? Thanks > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309,
[jira] [Updated] (MAPREDUCE-5411) Refresh size of loaded job cache on history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5411: -- Attachment: LOADED_JOB_CACHE_MR5411-2.txt > Refresh size of loaded job cache on history server > -- > > Key: MAPREDUCE-5411 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5411 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Attachments: LOADED_JOB_CACHE_MR5411-1.txt, > LOADED_JOB_CACHE_MR5411-2.txt > > > We want to be able to refresh size of the loaded job > cache(mapreduce.jobhistory.loadedjobs.cache.size) of history server > through history server's admin interface. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721112#comment-13721112 ] Jason Lowe commented on MAPREDUCE-5419: --- +1, looks good to me as well. I'll commit this shortly. Note that initially I could not reproduce this problem, but it is very reproducible by cleaning and only running the TestSlive#testDataWriting test. It's easier to reproduce with JDK7 when running all of the TestSlive tests since that does not run the unit tests in a deterministic order. > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5421: --- Resolution: Fixed Fix Version/s: 2.1.0-beta Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk, branch-2 and branch-2.1. Thanks Junping! > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Fix For: 2.1.0-beta > > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5421: --- Component/s: test Priority: Blocker (was: Major) > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Junping Du >Assignee: Junping Du >Priority: Blocker > Fix For: 2.1.0-beta > > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721097#comment-13721097 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5421: +1. Checking this in.. > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
[ https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong resolved MAPREDUCE-5424. -- Resolution: Duplicate is duplicated as MAPREDUCE-5421 > TestNonExistentJob failing after YARN-873 > - > > Key: MAPREDUCE-5424 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Vinod Kumar Vavilapalli >Assignee: Xuan Gong >Priority: Blocker > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
[ https://issues.apache.org/jira/browse/MAPREDUCE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721072#comment-13721072 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5424: It fails with the following: {code} Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 53.573 sec <<< FAILURE! testGetInvalidJob(org.apache.hadoop.mapreduce.v2.TestNonExistentJob) Time elapsed: 53420 sec <<< ERROR! java.io.IOException: org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application with id 'application_0_' doesn't exist in RM. at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:241) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:120) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:202) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:605) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:932) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2047) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2043) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2041) at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:328) at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:387) at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:522) at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:182) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:573) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1493) at org.apache.hadoop.mapred.JobClient.getJobUsingCluster(JobClient.java:573) at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:591) at org.apache.hadoop.mapreduce.v2.TestNonExistentJob.testGetInvalidJob(TestNonExistentJob.java:99) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141) {code} > TestNonExistentJob failing after YARN-873 > - > > Key: MAPREDUCE-5424 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: test >Reporter: Vinod Kumar Vavilapalli >Assignee: Xuan Gong >Priority: Blocker > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5424) TestNonExistentJob failing after YARN-873
Vinod Kumar Vavilapalli created MAPREDUCE-5424: -- Summary: TestNonExistentJob failing after YARN-873 Key: MAPREDUCE-5424 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5424 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Vinod Kumar Vavilapalli Assignee: Xuan Gong Priority: Blocker -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721044#comment-13721044 ] Xuan Gong commented on MAPREDUCE-5421: -- +1 Looks good > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-1981: -- Resolution: Fixed Fix Version/s: 0.23.10 2.3.0 3.0.0 Status: Resolved (was: Patch Available) Thanks Hairong, and thanks to everyone that contributed to reviews of various versions of the patch. I committed this to trunk, branch-2, and branch-0.23. > Improve getSplits performance by using listLocatedStatus > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5251: -- Resolution: Fixed Fix Version/s: 0.23.10 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 to the branch-0.23 patch and committed to branch-0.23. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721014#comment-13721014 ] Ravi Prakash commented on MAPREDUCE-5419: - Patch looks good to me. +1. Thanks Rob! > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1981) Improve getSplits performance by using listLocatedStatus
[ https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-1981: -- Summary: Improve getSplits performance by using listLocatedStatus (was: Improve getSplits performance by using listFiles, the new FileSystem API) Hadoop Flags: Reviewed Thanks for the reviews, Kihwal. Committing this. > Improve getSplits performance by using listLocatedStatus > > > Key: MAPREDUCE-1981 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: job submission >Affects Versions: 0.23.0 >Reporter: Hairong Kuang >Assignee: Hairong Kuang > Attachments: mapredListFiles1.patch, mapredListFiles2.patch, > mapredListFiles3.patch, mapredListFiles4.patch, mapredListFiles5.patch, > mapredListFiles.patch, MAPREDUCE-1981.branch-0.23.patch, MAPREDUCE-1981.patch > > > This jira will make FileInputFormat and CombinedFileInputForm to use the new > API, thus reducing the number of RPCs to HDFS NameNode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720982#comment-13720982 ] Jason Lowe commented on MAPREDUCE-5423: --- This may be a duplicate of MAPREDUCE-4842. > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309, usedMemory ->173863446 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.red
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5423: -- Component/s: mrv2 Affects Version/s: 2.0.2-alpha > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.0.2-alpha >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309, usedMemory ->173863446 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720979#comment-13720979 ] Chu Tong commented on MAPREDUCE-5423: - This is on 2.0.2-alpha > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309, usedMemory ->173863446 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s >
[jira] [Commented] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720976#comment-13720976 ] Jason Lowe commented on MAPREDUCE-5423: --- On which version of Hadoop did this occur? > Rare deadlock situation when reducers try to fetch map output > - > > Key: MAPREDUCE-5423 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Chu Tong > > During our cluster deployment, we found there is a very rare deadlock > situation when reducers try to fetch map output. We had 5 fetchers and log > snippet illustrates this problem is below (all fetchers went into a wait > state after they can't acquire more RAM beyond the memoryLimit and no fetcher > is releasing memory): > 2013-07-18 04:32:28,135 INFO [main] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: > memoryLimit=1503238528, maxSingleShuffleLimit=375809632, > mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 > 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for > fetching Map Completion Events > 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:28,146 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:28,319 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 > sent hash and receievd reply > 2013-07-18 04:32:28,320 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to > MEMORY > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from > map-output for attempt_1373902166027_0622_m_17_0 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> > 0, usedMemory ->27 > 2013-07-18 04:32:28,325 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s > 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion > Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: > attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging > 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 > 2013-07-18 04:32:33,158 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to > 101-09-04.sc1.verticloud.com:8080 to fetcher#1 > 2013-07-18 04:32:33,161 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: for > url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 > sent hash and receievd reply > 2013-07-18 04:32:33,200 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle > output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: > 55841286 to MEMORY > 2013-07-18 04:32:33,322 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from > map-output for attempt_1373902166027_0622_m_16_0 > 2013-07-18 04:32:33,323 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory > -> 27, usedMemory ->55841309 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from > map-output for attempt_1373902166027_0622_m_15_0 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> > map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, > commitMemory -> 55841309, usedMemory ->173863446 > 2013-07-18 04:32:39,594 INFO [fetcher#1] > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: > 101-09-04.sc1.verticloud.com:8080 free
[jira] [Commented] (MAPREDUCE-5402) DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE
[ https://issues.apache.org/jira/browse/MAPREDUCE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720977#comment-13720977 ] Mithun Radhakrishnan commented on MAPREDUCE-5402: - Gentlemen, I'm afraid I'll have to review this next week. (I'm swamped.) The main reason we tried to limit the maximum number of chunks on the DFS is because these are extremely small files (holding only target-file names/locations). Plus, they're likely to be short-lived. Increasing the number of these will increase NameNode pressure (short-lived file-objects). 400 was a good target for us at Yahoo, per DistCp job. I agree that keeping this configurable would be best. But then the responsibility of being polite to the name-node will transfer to the user. > DynamicInputFormat should allow overriding of MAX_CHUNKS_TOLERABLE > -- > > Key: MAPREDUCE-5402 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5402 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp, mrv2 >Reporter: David Rosenstrauch >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5402.1.patch, MAPREDUCE-5402.2.patch, > MAPREDUCE-5402.3.patch > > > In MAPREDUCE-2765, which provided the design spec for DistCpV2, the author > describes the implementation of DynamicInputFormat, with one of the main > motivations cited being to reduce the chance of long-tails where a few > leftover mappers run much longer than the rest. > However, I today ran into a situation where I experienced exactly such a long > tail using DistCpV2 and DynamicInputFormat. And when I tried to alleviate > the problem by overriding the number of mappers and the split ratio used by > the DynamicInputFormat, I was prevented from doing so by the hard-coded limit > set in the code by the MAX_CHUNKS_TOLERABLE constant. (Currently set to 400.) > This constant is actually set quite low for production use. (See a > description of my use case below.) And although MAPREDUCE-2765 states that > this is an "overridable maximum", when reading through the code there does > not actually appear to be any mechanism available to override it. > This should be changed. It should be possible to expand the maximum # of > chunks beyond this arbitrary limit. > For example, here is the situation I ran into today: > I ran a distcpv2 job on a cluster with 8 machines containing 128 map slots. > The job consisted of copying ~2800 files from HDFS to Amazon S3. I overrode > the number of mappers for the job from the default of 20 to 128, so as to > more properly parallelize the copy across the cluster. The number of chunk > files created was calculated as 241, and mapred.num.entries.per.chunk was > calculated as 12. > As the job ran on, it reached a point where there were only 4 remaining map > tasks, which had each been running for over 2 hours. The reason for this was > that each of the 12 files that those mappers were copying were quite large > (several hundred megabytes in size) and took ~20 minutes each. However, > during this time, all the other 124 mappers sat idle. > In theory I should be able to alleviate this problem with DynamicInputFormat. > If I were able to, say, quadruple the number of chunk files created, that > would have made each chunk contain only 3 files, and these large files would > have gotten distributed better around the cluster and copied in parallel. > However, when I tried to do that - by overriding mapred.listing.split.ratio > to, say, 10 - DynamicInputFormat responded with an exception ("Too many > chunks created with splitRatio:10, numMaps:128. Reduce numMaps or decrease > split-ratio to proceed.") - presumably because I exceeded the > MAX_CHUNKS_TOLERABLE value of 400. > Is there any particular logic behind this MAX_CHUNKS_TOLERABLE limit? I > can't personally see any. > If this limit has no particular logic behind it, then it should be > overridable - or even better: removed altogether. After all, I'm not sure I > see any need for it. Even if numMaps * splitRatio resulted in an > extraordinarily large number, if the code were modified so that the number of > chunks got calculated as Math.min( numMaps * splitRatio, numFiles), then > there would be no need for MAX_CHUNKS_TOLERABLE. In this worst-case scenario > where the product of numMaps and splitRatio is large, capping the number of > chunks at the number of files (numberOfChunks = numberOfFiles) would result > in 1 file per chunk - the maximum parallelization possible. That may not be > the best-tuned solution for some users, but I would think that it should be > left up to the user to deal with the potential consequence of not having > tuned their job properly. Certainly that would be be
[jira] [Updated] (MAPREDUCE-5386) Ability to refresh history server job retention and job cleaner settings
[ https://issues.apache.org/jira/browse/MAPREDUCE-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-5386: -- Resolution: Fixed Fix Version/s: 2.3.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Ashwin! I committed this to trunk and branch-2. > Ability to refresh history server job retention and job cleaner settings > > > Key: MAPREDUCE-5386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5386 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: jobhistoryserver >Affects Versions: 2.1.0-beta >Reporter: Ashwin Shankar >Assignee: Ashwin Shankar > Labels: features > Fix For: 3.0.0, 2.3.0 > > Attachments: JOB_RETENTION-1.txt, JOB_RETENTION-2.txt, > JOB_RETENTION-3.txt, JOB_RETENTION-4.txt, JOB_RETENTION--5.txt > > > We want to be able to refresh following job retention parameters > without having to bounce the history server : > 1. Job retention time - mapreduce.jobhistory.max-age-ms > 2. Cleaner interval - mapreduce.jobhistory.cleaner.interval-ms > 3. Enable/disable cleaner -mapreduce.jobhistory.cleaner.enable -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-5423: Description: During our cluster deployment, we found there is a very rare deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory -> 27, usedMemory ->55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory -> 55841309, usedMemory ->173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:42,190 INFO [fetcher#1] org.a
[jira] [Created] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
Chu Tong created MAPREDUCE-5423: --- Summary: Rare deadlock situation when reducers try to fetch map output Key: MAPREDUCE-5423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Chu Tong During our cluster deployment, we found there is deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below: 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory -> 27, usedMemory ->55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory -> 55841309, usedMemory ->173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:42,190 INFO [fetcher#1]
[jira] [Updated] (MAPREDUCE-5423) Rare deadlock situation when reducers try to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chu Tong updated MAPREDUCE-5423: Description: During our cluster deployment, we found there is deadlock situation when reducers try to fetch map output. We had 5 fetchers and log snippet illustrates this problem is below (all fetchers went into a wait state after they can't acquire more RAM beyond the memoryLimit and no fetcher is releasing memory): 2013-07-18 04:32:28,135 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: memoryLimit=1503238528, maxSingleShuffleLimit=375809632, mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0 Thread started: EventFetcher for fetching Map Completion Events 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:28,146 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:28,319 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_17_0 sent hash and receievd reply 2013-07-18 04:32:28,320 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_17_0 decomp: 27 len: 31 to MEMORY 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from map-output for attempt_1373902166027_0622_m_17_0 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->27 2013-07-18 04:32:28,325 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:33,158 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:33,161 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: for url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_16_0 sent hash and receievd reply 2013-07-18 04:32:33,200 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle output of map attempt_1373902166027_0622_m_16_0 decomp: 55841282 len: 55841286 to MEMORY 2013-07-18 04:32:33,322 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from map-output for attempt_1373902166027_0622_m_16_0 2013-07-18 04:32:33,323 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory -> 27, usedMemory ->55841309 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from map-output for attempt_1373902166027_0622_m_15_0 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, commitMemory -> 55841309, usedMemory ->173863446 2013-07-18 04:32:39,594 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: attempt_1373902166027_0622_r_01_0: Got 1 new map-outputs 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1 2013-07-18 04:32:42,188 INFO [fetcher#1] org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 101-09-04.sc1.verticloud.com:8080 to fetcher#1 2013-07-18 04:32:42,190 INFO [fetcher#1] org.apache.hadoop.
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720939#comment-13720939 ] Sandy Ryza commented on MAPREDUCE-5367: --- I don't think the problem exists in trunk. getLocalTaskDir includes the job ID in the path, so there shouldn't be collisions. The other place that localRunner/ is used is for writing the job conf, which includes the job ID in its name. So that also should not be a problem. Though thinking about it now, it might make sense to change it as well for consistency? > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5419) TestSlive is getting FileNotFound Exception
[ https://issues.apache.org/jira/browse/MAPREDUCE-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720906#comment-13720906 ] Robert Parker commented on MAPREDUCE-5419: -- The test failures have been identified as defects by other tickets: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile YARN-885,YARN-960 org.apache.hadoop.mapreduce.security.TestMRCredentials YARN-960 org.apache.hadoop.mapreduce.v2.TestNonExistentJob MAPREDUCE-5421 > TestSlive is getting FileNotFound Exception > --- > > Key: MAPREDUCE-5419 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5419 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: trunk, 2.1.0-beta, 0.23.9 >Reporter: Robert Parker >Assignee: Robert Parker > Attachments: MAPREDUCE-5419.patch > > > The write directory "slive" is not getting created on the FS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720897#comment-13720897 ] Hadoop QA commented on MAPREDUCE-5251: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594411/MAPREDUCE-5251-7-b23.txt against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3907//console This message is automatically generated. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5251) Reducer should not implicate map attempt if it has insufficient space to fetch map output
[ https://issues.apache.org/jira/browse/MAPREDUCE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Shankar updated MAPREDUCE-5251: -- Attachment: MAPREDUCE-5251-7-b23.txt Thanks a lot Jason. I've attached the patch for 23. > Reducer should not implicate map attempt if it has insufficient space to > fetch map output > - > > Key: MAPREDUCE-5251 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5251 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 0.23.7, 2.0.4-alpha >Reporter: Jason Lowe >Assignee: Ashwin Shankar > Attachments: MAPREDUCE-5251-2.txt, MAPREDUCE-5251-3.txt, > MAPREDUCE-5251-4.txt, MAPREDUCE-5251-5.txt, MAPREDUCE-5251-6.txt, > MAPREDUCE-5251-7-b23.txt, MAPREDUCE-5251-7.txt > > > A job can fail if a reducer happens to run on a node with insufficient space > to hold a map attempt's output. The reducer keeps reporting the map attempt > as bad, and if the map attempt ends up being re-launched too many times > before the reducer decides maybe it is the real problem the job can fail. > In that scenario it would be better to re-launch the reduce attempt and > hopefully it will run on another node that has sufficient space to complete > the shuffle. Reporting the map attempt is bad and relaunching the map task > doesn't change the fact that the reducer can't hold the output. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720832#comment-13720832 ] Hadoop QA commented on MAPREDUCE-5421: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594396/MAPREDUCE-5421-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3906//console This message is automatically generated. > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Attachment: MAPREDUCE-5421-v2.patch The ApplicationNotFoundException in server side should be translated to IOException in client side. Update to v2 patch to fix it. The left 2 failure is unrelated as it also appears in other jenkins job (like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845/testReport/) > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch, MAPREDUCE-5421-v2.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720740#comment-13720740 ] Radim Kolar commented on MAPREDUCE-5153: its very simple to implement. If you want to push things forward then do it. > Support for running combiners without reducers > -- > > Key: MAPREDUCE-5153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > > scenario: Workflow mapper -> sort -> combiner -> hdfs > No api change is need, if user set combiner class and reducers = 0 then run > combiner and sent output to HDFS. > Popular libraries such as scalding and cascading are offering this > functionality, but they use caching entire mapper output in memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720715#comment-13720715 ] Hadoop QA commented on MAPREDUCE-5421: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12594364/MAPREDUCE-5421.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.security.TestBinaryTokenFile org.apache.hadoop.mapreduce.security.TestMRCredentials org.apache.hadoop.mapreduce.v2.TestNonExistentJob {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3905//console This message is automatically generated. > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5409) MRAppMaster throws InvalidStateTransitonException: Invalid event: TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl
[ https://issues.apache.org/jira/browse/MAPREDUCE-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5409: - Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-5422 > MRAppMaster throws InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at KILLED for TaskAttemptImpl > - > > Key: MAPREDUCE-5409 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5409 > Project: Hadoop Map/Reduce > Issue Type: Sub-task >Affects Versions: 2.0.5-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > {code:xml} > 2013-07-23 12:28:05,217 INFO [IPC Server handler 29 on 50796] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt > attempt_1374560536158_0003_m_40_0 is : 0.0 > 2013-07-23 12:28:05,221 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Too many fetch-failures > for output of task attempt: attempt_1374560536158_0003_m_07_0 ... raising > fetch failure to map > 2013-07-23 12:28:05,222 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Can't handle > this event at current state for attempt_1374560536158_0003_m_07_0 > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > TA_TOO_MANY_FETCH_FAILURE at KILLED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:1032) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:143) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1123) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:1115) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at > org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:662) > 2013-07-23 12:28:05,249 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: > job_1374560536158_0003Job Transitioned from RUNNING to ERROR > 2013-07-23 12:28:05,338 INFO [IPC Server handler 16 on 50796] > org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from > attempt_1374560536158_0003_m_40_0 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5400) MRAppMaster throws InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED for JobImpl
[ https://issues.apache.org/jira/browse/MAPREDUCE-5400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-5400: - Issue Type: Sub-task (was: Bug) Parent: MAPREDUCE-5422 > MRAppMaster throws InvalidStateTransitonException: Invalid event: > JOB_TASK_COMPLETED at SUCCEEDED for JobImpl > - > > Key: MAPREDUCE-5400 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5400 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster >Affects Versions: 2.0.5-alpha >Reporter: J.Andreina >Assignee: Devaraj K >Priority: Minor > Attachments: MAPREDUCE-5400.patch > > > Step 1: Install cluster with HDFS , MR > Step 2: Execute a job > Step 3: Issue a kill task attempt for which the task has got completed. > Rex@HOST-10-18-91-55:~/NodeAgentTmpDir/installations/hadoop-2.0.5.tar/hadoop-2.0.5/bin> > ./mapred job -kill-task attempt_1373875322959_0032_m_00_0 > No GC_PROFILE is given. Defaults to medium. > 13/07/15 14:46:32 INFO service.AbstractService: > Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited. > 13/07/15 14:46:32 INFO proxy.ResourceManagerProxies: HA Proxy Creation with > xface : interface org.apache.hadoop.yarn.api.ClientRMProtocol > 13/07/15 14:46:33 INFO service.AbstractService: > Service:org.apache.hadoop.yarn.client.YarnClientImpl is started. > Killed task attempt_1373875322959_0032_m_00_0 > Observation: > === > 1. task state has been transitioned from SUCCEEDED to SCHEDULED > 2. For a Succeeded attempt , when client issues Kill , then the client is > notified as killed for a succeeded attempt. > 3. Launched second task_attempt which is succeeded and then killed later on > client request. > 4. Even after the job state transitioned from SUCCEEDED to ERROR , on UI the > state is succeeded > Issue : > = > 1. Client has been notified that the atttempt is killed , but acutually the > attempt is succeeded and the same is displayed in JHS UI. > 2. At App master InvalidStateTransitonException is thrown . > 3. At client side and JHS job has exited with state Finished/succeeded ,At RM > side the state is Finished/Failed. > AM Logs: > > 2013-07-15 14:46:25,461 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: > attempt_1373875322959_0032_m_00_0 TaskAttempt Transitioned from RUNNING > to SUCCEEDED > 2013-07-15 14:46:25,468 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with > attempt attempt_1373875322959_0032_m_00_0 > 2013-07-15 14:46:25,470 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: > task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED > 2013-07-15 14:46:33,810 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: > task_1373875322959_0032_m_00 Task Transitioned from SUCCEEDED to SCHEDULED > 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with > attempt attempt_1373875322959_0032_m_00_1 > 2013-07-15 14:46:37,344 INFO [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: > task_1373875322959_0032_m_00 Task Transitioned from RUNNING to SUCCEEDED > 2013-07-15 14:46:37,345 ERROR [AsyncDispatcher event handler] > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event > at current state > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: > JOB_TASK_COMPLETED at SUCCEEDED > at > org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) > at > org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43) > at > org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:445) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:866) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:128) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1095) > at > org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1091) > at > org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:130) > at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:77) > at java.lang.Thread.run(Thread.java:662) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on J
[jira] [Created] (MAPREDUCE-5422) [Umbrella] Fix invalid state transitions in MRAppMaster
Devaraj K created MAPREDUCE-5422: Summary: [Umbrella] Fix invalid state transitions in MRAppMaster Key: MAPREDUCE-5422 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5422 Project: Hadoop Map/Reduce Issue Type: Task Components: mr-am Affects Versions: 2.0.5-alpha Reporter: Devaraj K Assignee: Devaraj K There are mutiple invalid state transitions for the state machines present in MRAppMaster. All these can be handled as part of this umbrell JIRA. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Attachment: MAPREDUCE-5421.patch Upload a quick patch to fix it. > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
[ https://issues.apache.org/jira/browse/MAPREDUCE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-5421: -- Target Version/s: 2.1.0-beta Status: Patch Available (was: Open) > TestNonExistentJob is failed due to recent changes in YARN > -- > > Key: MAPREDUCE-5421 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Junping Du >Assignee: Junping Du > Attachments: MAPREDUCE-5421.patch > > > After YARN-873, try to get an application report with unknown appID will get > a exception instead of null. This cause test failure in TestNonExistentJob > which affects other irrelevant jenkins jobs like: > https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We > need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5421) TestNonExistentJob is failed due to recent changes in YARN
Junping Du created MAPREDUCE-5421: - Summary: TestNonExistentJob is failed due to recent changes in YARN Key: MAPREDUCE-5421 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5421 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Junping Du Assignee: Junping Du After YARN-873, try to get an application report with unknown appID will get a exception instead of null. This cause test failure in TestNonExistentJob which affects other irrelevant jenkins jobs like: https://builds.apache.org/job/PreCommit-HADOOP-Build/2845//testReport/. We need to fix test failure here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5279) mapreduce scheduling deadlock
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720632#comment-13720632 ] Tsuyoshi OZAWA commented on MAPREDUCE-5279: --- [~pengzhang], thank you for contributing! Can you rebase on current trunk please? > mapreduce scheduling deadlock > - > > Key: MAPREDUCE-5279 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, scheduler >Affects Versions: 2.0.3-alpha >Reporter: PengZhang >Assignee: PengZhang > Fix For: trunk > > Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch > > > YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't > take into account virtual cores while scheduling reduce tasks. > This may cause more reduce tasks to be scheduled because memory is enough. > And on a small cluster, this will end with deadlock, all running containers > are reduce tasks but map phase is not finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5279) mapreduce scheduling deadlock
[ https://issues.apache.org/jira/browse/MAPREDUCE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5279: -- Assignee: PengZhang > mapreduce scheduling deadlock > - > > Key: MAPREDUCE-5279 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5279 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, scheduler >Affects Versions: 2.0.3-alpha >Reporter: PengZhang >Assignee: PengZhang > Fix For: trunk > > Attachments: MAPREDUCE-5279.patch, MAPREDUCE-5279-v2.patch > > > YARN-2 imported cpu dimension scheduling, but MR RMContainerAllocator doesn't > take into account virtual cores while scheduling reduce tasks. > This may cause more reduce tasks to be scheduled because memory is enough. > And on a small cluster, this will end with deadlock, all running containers > are reduce tasks but map phase is not finished. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5367) Local jobs all use same local working directory
[ https://issues.apache.org/jira/browse/MAPREDUCE-5367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720600#comment-13720600 ] Tom White commented on MAPREDUCE-5367: -- I was looking at trunk. Doesn't this need fixing for trunk too? > Local jobs all use same local working directory > --- > > Key: MAPREDUCE-5367 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5367 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.2.0 >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: MAPREDUCE-5367-b1.patch > > > This means that local jobs, even in different JVMs, can't run concurrently > because they might delete each other's files during work directory setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira