[jira] [Commented] (MAPREDUCE-5519) Change JobClient to use YarnClient to interact with RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-5519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772672#comment-13772672 ] Sandy Ryza commented on MAPREDUCE-5519: --- What does it use instead? > Change JobClient to use YarnClient to interact with RM > -- > > Key: MAPREDUCE-5519 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5519 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5519) Change JobClient to use YarnClient to interact with RM
Jian He created MAPREDUCE-5519: -- Summary: Change JobClient to use YarnClient to interact with RM Key: MAPREDUCE-5519 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5519 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Jian He -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772609#comment-13772609 ] Tsuyoshi OZAWA commented on MAPREDUCE-5518: --- +1 > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5518: -- Hadoop Flags: Reviewed > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772605#comment-13772605 ] Vrushali C commented on MAPREDUCE-5502: --- The diff that I pasted does not "appear" correct and seems like I can't edit it now. To summarize the two changes proposed are: 1) if (status.getState() != JobStatus.State.RUNNING) changes to if (status.getState() == JobStatus.State.PREP) 2) if (status.getState() != JobStatus.State.KILLED) extends to if ((status.getState() != JobStatus.State.KILLED) && (status.getState() != JobStatus.State.FAILED) && (status.getState() != JobStatus.State.SUCCEEDED)) > History link in resource manager is broken for KILLED jobs > -- > > Key: MAPREDUCE-5502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Vrushali C >Assignee: Vrushali C > Labels: ui > > History link in resource manager is broken for KILLED jobs. > Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If > the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link > is fine. > It isn't easy to reproduce the problem since the time at which the app is > killed determines the state it ends up in, which is hard to guess. these > particular jobs seem to get a Diagnostics message of "Application killed by > user." where as the other killed jobs get " Kill Job received from client > job_1378766187901_0002 > Job received Kill while in RUNNING state. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772604#comment-13772604 ] Vrushali C commented on MAPREDUCE-5502: --- Right. So the fix seems to be in two places in YARNRunner. I think that the first call to resMgrDelegate.killApplication should occur when the job is in PREP state, (as opposed to !RUNNING in the current code). At this time, there is no RUNNING/FAILED/SUCCEEDED/KILLED status since it probably has not even started running. Hence the kill to RM and return would make sense. In this case, the application ends up in KILLED/KILLED which is correct according to me. The "Tracking URL: History" on the cluster/app/application_number page points to itself, which is also correct in this case I think. The second call to resMgrDelegate.killApplication should occur when the JobStatus is in any of the terminal states - KILLED/SUCCEEDED/FAILED. In this case, the application ends up in FINISHED/KILLED , FINISHED/SUCCEEDED (I haven't yet experimented with FAILED). The tracking URL on the application page is also updated correctly. The changes are: --- a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java @@ -550,8 +550,12 @@ public JobStatus getJobStatus(JobID jobID) throws IOException, public void killJob(JobID arg0) throws IOException, InterruptedException { /* check if the status is not running, if not send kill to RM */ JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0); -if (status.getState() != JobStatus.State.RUNNING) { +if ( (status.getState() == JobStatus.State.PREP) ) { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); return; } @@ -574,7 +578,11 @@ public void killJob(JobID arg0) throws IOException, InterruptedException { } catch(IOException io) { LOG.debug("Error when checking for application status", io); } -if (status.getState() != JobStatus.State.KILLED) { +if ( (status.getState() != JobStatus.State.KILLED) && + (status.getState() != JobStatus.State.FAILED) && +(status.getState() != JobStatus.State.SUCCEEDED) ){ resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); } What do you think? > History link in resource manager is broken for KILLED jobs > -- > > Key: MAPREDUCE-5502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Vrushali C >Assignee: Vrushali C > Labels: ui > > History link in resource manager is broken for KILLED jobs. > Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If > the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link > is fine. > It isn't easy to reproduce the problem since the time at which the app is > killed determines the state it ends up in, which is hard to guess. these > particular jobs seem to get a Diagnostics message of "Application killed by > user." where as the other killed jobs get " Kill Job received from client > job_1378766187901_0002 > Job received Kill while in RUNNING state. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772591#comment-13772591 ] Hadoop QA commented on MAPREDUCE-5518: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604172/MAPREDUCE-5518.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-examples. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4019//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4019//console This message is automatically generated. > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2288) JT Availability
[ https://issues.apache.org/jira/browse/MAPREDUCE-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772586#comment-13772586 ] Tsuyoshi OZAWA commented on MAPREDUCE-2288: --- Does this feature mean MRAppMaster's HA? > JT Availability > --- > > Key: MAPREDUCE-2288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2288 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: jobtracker >Reporter: Eli Collins > > This is an umbrella jira, like HDFS-1064, for discussing and providing > references to jobtracker availability jiras (eg from JT restart on a host or > to cross host fail-over). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772584#comment-13772584 ] Tsuyoshi OZAWA commented on MAPREDUCE-5518: --- LGTM, so I submitted your patch. > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-5518: -- Assignee: Albert Chu Status: Patch Available (was: Open) > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Assignee: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5481) TestUberAM timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772559#comment-13772559 ] Xuan Gong commented on MAPREDUCE-5481: -- Looks like the issue is on testSleepJob. I commented out the testSleepJob, we would not meet the timeout issue. > TestUberAM timeout > -- > > Key: MAPREDUCE-5481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, test >Affects Versions: 3.0.0 >Reporter: Jason Lowe >Assignee: Xuan Gong >Priority: Blocker > > TestUberAM has been timing out on trunk for some time now and surefire then > fails the build. I'm not able to reproduce it locally, but the Jenkins > builds have been seeing it fairly consistently. See > https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5518) Fix typo "can't read paritions file"
[ https://issues.apache.org/jira/browse/MAPREDUCE-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Chu updated MAPREDUCE-5518: -- Attachment: MAPREDUCE-5518.patch No tests added, it's a trivial typo fix. > Fix typo "can't read paritions file" > > > Key: MAPREDUCE-5518 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: examples >Affects Versions: 3.0.0 >Reporter: Albert Chu >Priority: Trivial > Attachments: MAPREDUCE-5518.patch > > > Noticed a spelling error when I saw this error message > {noformat} > 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : > attempt_1379622083112_0002_m_000114_0, Status : FAILED > Error: java.lang.IllegalArgumentException: can't read paritions file > {noformat} > "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772533#comment-13772533 ] Hadoop QA commented on MAPREDUCE-5505: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604165/MAPREDUCE-5505.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4018//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4018//console This message is automatically generated. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, > MAPREDUCE-5505.3.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5518) Fix typo "can't read paritions file"
Albert Chu created MAPREDUCE-5518: - Summary: Fix typo "can't read paritions file" Key: MAPREDUCE-5518 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5518 Project: Hadoop Map/Reduce Issue Type: Bug Components: examples Affects Versions: 3.0.0 Reporter: Albert Chu Priority: Trivial Noticed a spelling error when I saw this error message {noformat} 13/09/19 13:25:08 INFO mapreduce.Job: Task Id : attempt_1379622083112_0002_m_000114_0, Status : FAILED Error: java.lang.IllegalArgumentException: can't read paritions file {noformat} "paritions" should be "partitions" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5515) Application Manager UI does not appear with Https enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli resolved MAPREDUCE-5515. Resolution: Fixed Hadoop Flags: Reviewed Committed this together with YARN-1203. Thanks Omkar! Will post the fix-version once the corresponding tags are available. > Application Manager UI does not appear with Https enabled > - > > Key: MAPREDUCE-5515 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5515 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: MAPREDUCE-5515.txt > > > related issue YARN-1203. We need to disable https for MR-AM by default as > they will need access to keystore which can not be granted in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated MAPREDUCE-5505: --- Status: Patch Available (was: Open) > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, > MAPREDUCE-5505.3.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen updated MAPREDUCE-5505: --- Attachment: MAPREDUCE-5505.3.patch Thanks Jian and Bikas for the comments. I've uploaded a new patch, which uses atomic boolean instead. It is placed at MRAppMaster, and exposed through AppContext, as other variables are. As the flag is put in MRAppMaster, instead of setting it in ClientService#serviceStop(), MRAppMaster#shutDownJob(), where other services have been stopped already, and ClientService is to be stopped. Therefore, it should be equivalent to what Bikas proposed, but simplify the code. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, > MAPREDUCE-5505.3.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5514) TestRMContainerAllocator fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772497#comment-13772497 ] Omkar Vinit Joshi commented on MAPREDUCE-5514: -- Can you check this ? is it trying to resolve ip? inside SecurityUtil.java {code} boolean useIp = conf.getBoolean( CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP, CommonConfigurationKeys.HADOOP_SECURITY_TOKEN_SERVICE_USE_IP_DEFAULT); {code} > TestRMContainerAllocator fails on trunk > --- > > Key: MAPREDUCE-5514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5514 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: > org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator-output.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5481) TestUberAM timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772495#comment-13772495 ] Xuan Gong commented on MAPREDUCE-5481: -- [~jlowe][~masokan] I did the localtest, too. Looks like the client can connect to RM. NM do the heartbeat normally, container status looks fine. But after mapper tasks finish, the reducer task will never start. The more weird thing is that (the reducer task will never start) happens when we set number of reducer is 1, but if i increase the number of reducer to 2, the reducers can start to work... I double checked the 2.1.beta, this test works, but it can not work on trunk. I am wondering if we made some changes after 2.1.beta to fail this test. Any idea ? > TestUberAM timeout > -- > > Key: MAPREDUCE-5481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, test >Affects Versions: 3.0.0 >Reporter: Jason Lowe >Assignee: Xuan Gong >Priority: Blocker > > TestUberAM has been timing out on trunk for some time now and surefire then > fails the build. I'm not able to reproduce it locally, but the Jenkins > builds have been seeing it fairly consistently. See > https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772484#comment-13772484 ] Bikas Saha commented on MAPREDUCE-5505: --- We could do that but lets leave it out of the scope of the current jira. For this one, lets keep doing what we used to do. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5515) Application Manager UI does not appear with Https enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772483#comment-13772483 ] Hudson commented on MAPREDUCE-5515: --- SUCCESS: Integrated in Hadoop-trunk-Commit #4446 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4446/]) YARN-1203. Changed YARN web-app proxy to handle http and https URLs from AM registration and finish correctly. Contributed by Omkar Vinit Joshi. MAPREDUCE-5515. Fixed MR AM's webapp to depend on a new config mapreduce.ssl.enabled to enable https and disabling it by default as MR AM needs to set up its own certificates etc and not depend on clusters'. Contributed by Omkar Vinit Joshi. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524864) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/MRAppMaster.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/client/MRClientService.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMCommunicator.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/JobBlock.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/NavBlock.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/TaskPage.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/WebAppUtil.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/dao/AMAttemptInfo.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/JobHistoryServer.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsJobBlock.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/webapp/HsTaskPage.java * /hadoop/common/trunk/hadoop-yarn-project/CHANGES.txt * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/FinishApplicationMasterRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/protocolrecords/RegisterApplicationMasterRequest.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/conf/YarnConfiguration.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/ProxyUriUtils.java * /hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-web-proxy/src/main/java/org/apache/hadoop/yarn/server/webproxy/WebAppProxyServlet.java > Application Manager UI does not appear with Https enabled > - > > Key: MAPREDUCE-5515 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5515 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: MAPREDUCE-5515.txt > > > related issue YARN-1203. We need to disable https for MR-AM by default as > they will need access to keystore which can not be granted in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators
[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772479#comment-13772479 ] Omkar Vinit Joshi commented on MAPREDUCE-5507: -- also there looks to be a problem with below code. You can either preempt reducer or schedule new but not both at the same time any thoughts? Planning to fix this as a part of this {code} if (recalculateReduceSchedule) { preemptReducesIfNeeded(); scheduleReduces( getJob().getTotalMaps(), completedMaps, scheduledRequests.maps.size(), scheduledRequests.reduces.size(), assignedRequests.maps.size(), assignedRequests.reduces.size(), mapResourceReqt, reduceResourceReqt, pendingReduces.size(), maxReduceRampupLimit, reduceSlowStart); recalculateReduceSchedule = false; } {code} > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772473#comment-13772473 ] Omkar Vinit Joshi commented on MAPREDUCE-5507: -- Potential problem I see here is that reducer preemption logic is mainly dependent on headroom (available resources) returned by RM. After discussing with [~vinodkv] and [~sseth] offline.. There are certain important points we need to take care of * If we ever hit the situation where I have assignedMaps=0,assignedReducers>0,scheduledMaps>0,scheduledRed>=0...then ** I should wait for some time.. *** we are proposing time to be min[ (some percentage of average map reduce task completion time) , (some configurable number * AM-RM heartbeat interval) ] ** if we don't get any new container for map task during above interval then we will follow *** first remove all the scheduled reducer requests as done today in RMContainerAllocator#preemptReducesIfNeeded() *** remove as many reducers as required to allocate a single map task. ** We should keep doing above steps repeatedly after above interval of time if we don't get any new map task. Also we should avoid ramping up later and cap the reducer count to the current running reducers as there is no point in requesting and canceling later the reducer requests/ killing running reducers in future (As we already using up to the capacity of the running user). > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5515) Application Manager UI does not appear with Https enabled
[ https://issues.apache.org/jira/browse/MAPREDUCE-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5515: --- Attachment: MAPREDUCE-5515.txt Here's the MR patch that Omkar worked on via YARN-1203. - It adds a new config mapreduce.ssl.enable that can be set by MR AMs explicitly if MR users want to enable Https on AM webapp. {code} + mapreduce.ssl.enabled + false + + If enabled, MapReduce application master's http server will be + started with SSL enabled. Map reduce AM by default doesn't support SSL. + If MapReduce jobs want SSL support, it is the user's responsibility to + create and manage certificates, keystores and trust-stores with appropriate + permissions. This is only for MapReduce application master and is not used + by job history server. To enable encrypted shuffle this property is not + required, instead refer to (mapreduce.shuffle.ssl.enabled) property. + + {code} Already reviewed and committing this together with YARN-1203. > Application Manager UI does not appear with Https enabled > - > > Key: MAPREDUCE-5515 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5515 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > Attachments: MAPREDUCE-5515.txt > > > related issue YARN-1203. We need to disable https for MR-AM by default as > they will need access to keystore which can not be granted in the cluster. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772437#comment-13772437 ] Jian He commented on MAPREDUCE-5505: bq. We probably don't need "appContext.isLastAMRetry()" this check here, In fact, in case of REBOOT,we can always return RUNNING. Either the AM is restarted and the jobClient continues to run, or the AM failed because of unregister fail or this is lastRetry, in which case RM can tell JobClient that the app FAILED. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772419#comment-13772419 ] Hudson commented on MAPREDUCE-5488: --- SUCCESS: Integrated in Hadoop-trunk-Commit #4445 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/4445/]) MAPREDUCE-5488. Changed MR client to keep trying to reach the application when it sees that on attempt's AM is down. Contributed by Jian He. (vinodkv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524856) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/dev-support/findbugs-exclude.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/ClientServiceDelegate.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestClientServiceDelegate.java > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at >
[jira] [Updated] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5488: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk, branch-2 and branch-2.1. Thanks Jian! Will set the fix-versions once 2.1.2/2.2 are created in JIRA. > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: > http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in > uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container > container_1377851032086_0003_01_01 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | > grep container_1377851032086_0003_01_01 | awk '{print \\\$2}' | xargs > kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of > known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying co
[jira] [Commented] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772367#comment-13772367 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5488: +1, this looks good. Checking this in. > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: > http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in > uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container > container_1377851032086_0003_01_01 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | > grep container_1377851032086_0003_01_01 | awk '{print \\\$2}' | xargs > kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of > known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: > hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(ma
[jira] [Updated] (MAPREDUCE-5507) MapReduce reducer ramp down is suboptimal with potential job-hanging issues
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5507: --- Summary: MapReduce reducer ramp down is suboptimal with potential job-hanging issues (was: MapReduce reducer preemption gets hanged) > MapReduce reducer ramp down is suboptimal with potential job-hanging issues > --- > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by releasing such a reducer / container because it will reduce its > cluster consumption and thereby may become candidate for an allocation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5504) mapred queue -info inconsistent with types
[ https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772294#comment-13772294 ] Thomas Graves commented on MAPREDUCE-5504: -- +1, looks good. Thanks Kousuke! > mapred queue -info inconsistent with types > -- > > Key: MAPREDUCE-5504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.23.9 >Reporter: Thomas Graves >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-5504.patch > > > $ mapred queue -info default > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: > 0.9309831 > The capacity is displayed in % as 4, however maximum capacity is displayed as > an absolute number 0.67 instead of 67%. > We should make these consistent with the type we are displaying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5504) mapred queue -info inconsistent with types
[ https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772315#comment-13772315 ] Hudson commented on MAPREDUCE-5504: --- SUCCESS: Integrated in Hadoop-trunk-Commit # (See [https://builds.apache.org/job/Hadoop-trunk-Commit//]) MAPREDUCE-5504. mapred queue -info inconsistent with types (Kousuke Saruta via tgraves) (tgraves: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524841) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/TypeConverter.java > mapred queue -info inconsistent with types > -- > > Key: MAPREDUCE-5504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.23.9 >Reporter: Thomas Graves >Assignee: Kousuke Saruta > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-5504.patch > > > $ mapred queue -info default > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: > 0.9309831 > The capacity is displayed in % as 4, however maximum capacity is displayed as > an absolute number 0.67 instead of 67%. > We should make these consistent with the type we are displaying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Status: Open (was: Patch Available) > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt, MAPREDUCE_5517_v2.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Status: Patch Available (was: Open) > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt, MAPREDUCE_5517_v2.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Attachment: MAPREDUCE_5517_v2.patch.txt > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt, MAPREDUCE_5517_v2.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5514) TestRMContainerAllocator fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772214#comment-13772214 ] Xuan Gong commented on MAPREDUCE-5514: -- I got following error message {code} 2013-09-19 12:51:42,379 FATAL [AsyncDispatcher event handler] event.AsyncDispatcher (AsyncDispatcher.java:dispatch(141)) - Error in dispatcher thread java.lang.IllegalArgumentException: java.net.UnknownHostException: amNM at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:418) at org.apache.hadoop.yarn.server.utils.BuilderUtils.newContainerToken(BuilderUtils.java:247) at org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager.createContainerToken(RMContainerTokenSecretManager.java:195) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainer(FifoScheduler.java:590) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignOffSwitchContainers(FifoScheduler.java:554) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainersOnNode(FifoScheduler.java:482) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.assignContainers(FifoScheduler.java:411) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.nodeUpdate(FifoScheduler.java:650) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:679) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler.handle(FifoScheduler.java:95) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator$MyResourceManager$1.handle(TestRMContainerAllocator.java:451) at org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator$MyResourceManager$1.handle(TestRMContainerAllocator.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134) at org.apache.hadoop.yarn.event.DrainDispatcher$1.run(DrainDispatcher.java:65) at java.lang.Thread.run(Thread.java:680) Caused by: java.net.UnknownHostException: amNM at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:419) ... 14 more {code} > TestRMContainerAllocator fails on trunk > --- > > Key: MAPREDUCE-5514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5514 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: > org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator-output.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated MAPREDUCE-5505: -- Status: Open (was: Patch Available) > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772301#comment-13772301 ] Bikas Saha commented on MAPREDUCE-5505: --- Are we sure that previous state is always RUNNING before FAILED? {code} + case FAILED: +if (isUnregistered) { + return JobState.FAILED; +} else { + return JobState.RUNNING; {code} Instead of isUnregistered, let us create an AtomicBoolean called safeToReportTerminationToUser. Instead of JobImpl, this boolean can be made visible via the AppContext object so that everyone has access to it. When to set the boolean to true? We could do it in RMCommunicator after unregister succeeds (like in this patch). Or we can do it in MRClientService.serviceStop(). Since MRClientService is the last service to stop() we can be sure that everything finished nicely. MRClientService.serviceStop() can set the boolean. Then we can move the sleep(5sec) from MRAppMaster to MRClientService.serviceStop() after setting the boolean. We should leave a comment explaining this in MRAppMaster.shutdown() before the call to clientService.stop() so that its easy for someone else to track this logic. Please do run single node tests to verify the behavior for real along with RM restart. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types
[ https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-5504: - Resolution: Fixed Fix Version/s: 0.23.10 2.3.0 3.0.0 Status: Resolved (was: Patch Available) > mapred queue -info inconsistent with types > -- > > Key: MAPREDUCE-5504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.23.9 >Reporter: Thomas Graves >Assignee: Kousuke Saruta > Fix For: 3.0.0, 2.3.0, 0.23.10 > > Attachments: MAPREDUCE-5504.patch > > > $ mapred queue -info default > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: > 0.9309831 > The capacity is displayed in % as 4, however maximum capacity is displayed as > an absolute number 0.67 instead of 67%. > We should make these consistent with the type we are displaying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5488: --- Status: Open (was: Patch Available) > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: > http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in > uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container > container_1377851032086_0003_01_01 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | > grep container_1377851032086_0003_01_01 | awk '{print \\\$2}' | xargs > kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of > known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: > hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS) > 13/08/30 08:45:56 INFO ipc.Client: Retrying connect to server:
[jira] [Updated] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5488: --- Attachment: MAPREDUCE-5488.3.patch updated findbug-exclude file > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: > http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in > uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container > container_1377851032086_0003_01_01 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | > grep container_1377851032086_0003_01_01 | awk '{print \\\$2}' | xargs > kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of > known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: > hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS) > 13/08/30 08:45:56 INFO i
[jira] [Updated] (MAPREDUCE-5481) TestUberAM timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5481: --- Priority: Blocker (was: Major) > TestUberAM timeout > -- > > Key: MAPREDUCE-5481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, test >Affects Versions: 3.0.0 >Reporter: Jason Lowe >Priority: Blocker > > TestUberAM has been timing out on trunk for some time now and surefire then > fails the build. I'm not able to reproduce it locally, but the Jenkins > builds have been seeing it fairly consistently. See > https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772291#comment-13772291 ] Jian He commented on MAPREDUCE-5505: {code} case REBOOT: if (isUnregistered && appContext.isLastAMRetry()) { return JobState.ERROR; {code} We probably don't need "appContext.isLastAMRetry()" this check here, since if this is the last retry, the app will fail on RM side. After this AM exits, JobClient is able to query RM for final status, in which case JobClient will be told FAILED. This is good for all transitions to follow the same logic. If that's the case we can create a common function to handle the logic that for every final state, if registered return final state, otherwise return the previous state > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated MAPREDUCE-5488: --- Status: Patch Available (was: Open) > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. > Instead, use mapreduce.job.maps > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is > deprecated. Instead, use mapreduce.job.output.key.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. > Instead, use mapreduce.job.working.dir > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: > job_1377851032086_0003 > 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, > Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application > application_1377851032086_0003 to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: > http://hostname:8088/proxy/application_1377851032086_0003/ > 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003 > 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in > uber mode : false > 13/08/30 08:45:48 INFO mapreduce.Job: map 0% reduce 0% > stop applicationmaster > beaver.component.hadoop|INFO|Kill container > container_1377851032086_0003_01_01 on host hostname > RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | > grep container_1377851032086_0003_01_01 | awk '{print \\\$2}' | xargs > kill -9\" root" > Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of > known hosts. > kill 8978: No such process > waiting for down time 10 seconds for service applicationmaster > 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: > hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS) > 13/08/30 08:45:56 INFO ipc.Client: Retrying connect to server:
[jira] [Updated] (MAPREDUCE-5503) TestMRJobClient.testJobClient is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5503: --- Priority: Blocker (was: Major) > TestMRJobClient.testJobClient is failing > > > Key: MAPREDUCE-5503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5503 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0 >Reporter: Jason Lowe >Priority: Blocker > > TestMRJobClient.testJobClient is failing on trunk and causing precommit > builds to complain: > {noformat} > testJobClient(org.apache.hadoop.mapreduce.TestMRJobClient) Time elapsed: > 26.361 sec <<< FAILURE! > junit.framework.AssertionFailedError: expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at junit.framework.Assert.assertEquals(Assert.java:199) > at junit.framework.Assert.assertEquals(Assert.java:205) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobList(TestMRJobClient.java:474) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:112) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772242#comment-13772242 ] Hadoop QA commented on MAPREDUCE-5517: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604100/MAPREDUCE_5517_v2.patch.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4017//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4017//console This message is automatically generated. > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt, MAPREDUCE_5517_v2.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered
[ https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772238#comment-13772238 ] Jian He commented on MAPREDUCE-5505: bq. markUnregistered will not be called, and JobClient will still see RUNNING. Correct, JobClient will see RUNNING until AM exits, in which case JobClient will keep waiting until next AM comes up(MAPREDUCE-5488 made this change). Here we make a decision that if unregister call fails, the MR job is deemed as fail and will be restarted by RM. isUnregistered use atomic boolean ? test case: also assert job state is running before markUnregistered is called. > Clients should be notified job finished only after job successfully > unregistered > - > > Key: MAPREDUCE-5505 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He >Assignee: Zhijie Shen > Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch > > > This is to make sure user is notified job finished after job is really done. > This does increase client latency but can reduce some races during unregister > like YARN-540 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5488) Job recovery fails after killing all the running containers for the app
[ https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772205#comment-13772205 ] Hadoop QA commented on MAPREDUCE-5488: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604075/MAPREDUCE-5488.3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.TestMRJobClient The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient: org.apache.hadoop.mapreduce.v2.TestUberAM {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4015//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4015//console This message is automatically generated. > Job recovery fails after killing all the running containers for the app > --- > > Key: MAPREDUCE-5488 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.1.0-beta >Reporter: Arpit Gupta >Assignee: Jian He > Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, > MAPREDUCE-5488.3.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, > MAPREDUCE-5488.patch > > > Here is the client stack trace > {code} > RUNNING: /usr/lib/hadoop/bin/hadoop jar > /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar > wordcount "-Dmapreduce.reduce.input.limit=-1" > /user/user/test_yarn_ha/medium_wordcount_input > /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time > 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at > hostname/68.142.247.148:8032 > 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 > for user on ha-hdfs:ha-2-secure > 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; > Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: > (HDFS_DELEGATION_TOKEN token 19 for user) > 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : > 20 > 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library > 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized > native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3] > 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180 > 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, > use mapreduce.job.user.name > 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, > use mapreduce.job.jar > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is > deprecated. Instead, use mapreduce.job.output.value.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is > deprecated. Instead, use mapreduce.job.combine.class > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. > Instead, use mapreduce.job.map.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. > Instead, use mapreduce.job.name > 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is > deprecated. Instead, use mapreduce.job.reduce.class > 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. > Instead, use mapreduce.input.fileinputformat.inputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. > Instead, use mapreduce.output.fileoutputformat.outputdir > 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is depre
[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob
[ https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772173#comment-13772173 ] Chris Nauroth commented on MAPREDUCE-5508: -- Thanks for the new patch, Xi. This mostly looks good to me, and I'm glad to hear that it still fixes the memory leak. Here are a few comments: # Can we remove the unused {{PathDeletionContext}} constructor? It would require a small change in {{TestCleanupQueue}}. # Swallowing the {{InterruptedException}} is problematic if any upstream code depends on seeing the thread's interrupted status, so let's restore the interrupted status in the catch block by calling {{Thread.currentThread().interrupt()}}. # If there is an {{InterruptedException}}, then we currently would pass a null {{tempDirFs}} to the {{CleanupQueue}}, where we'd once again risk leaking memory. I suggest that if there is an {{InterruptedException}}, then we skip adding to the {{CleanupQueue}} and log a warning. This is consistent with the error-handling strategy in the rest of the method. (It logs warnings.) > JobTracker memory leak caused by unreleased FileSystem objects in > JobInProgress#cleanupJob > -- > > Key: MAPREDUCE-5508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1-win, 1.2.1 >Reporter: Xi Fang >Assignee: Xi Fang >Priority: Critical > Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch > > > MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem > object (see "tempDirFs") that is not properly released. > {code} JobInProgress#cleanupJob() > void cleanupJob() { > ... > tempDirFs = jobTempDirPath.getFileSystem(conf); > CleanupQueue.getInstance().addToQueue( > new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId)); > ... > if (tempDirFs != fs) { > try { > fs.close(); > } catch (IOException ie) { > ... > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5507) MapReduce reducer preemption gets hanged
[ https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Omkar Vinit Joshi updated MAPREDUCE-5507: - Description: Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched more aggressively. However the calculation to either Ramp up or Ramp down reducer is not done in most optimal way. * If MR AM at any point sees situation something like ** scheduledMaps : 30 ** scheduledReducers : 10 ** assignedMaps : 0 ** assignedReducers : 11 ** finishedMaps : 120 ** headroom : 756 ( when your map /reduce task needs only 512mb) * then today it simply hangs because it thinks that there is sufficient room to launch one more mapper and therefore there is no need to ramp down. However, if this continues forever then this is not the correct way / optimal way. * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 and there are running reducers around then it should wait for certain time ( upper limited by average map task completion time ... for heuristic sake)..but after that if still it doesn't get new container for map task then it should preempt the reducer one by one with some interval and should ramp up slowly... ** Preemption of reducers can be done in little smarter way *** preempt reducer on a node manager for which there is any pending map request. *** otherwise preempt any other reducer. MR AM will contribute to getting new mapper by releasing such a reducer / container because it will reduce its cluster consumption and thereby may become candidate for an allocation. was: Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and "mapreduce.job.reduce.slowstart.completedmaps" then reducer are launched more aggressively. However the calculation to either Ramp up or Ramp down reducer is not down in most optimal way. * If MR AM at any point sees situation something like ** scheduledMaps : 30 ** scheduledReducers : 10 ** assignedMaps : 0 ** assignedReducers : 11 ** finishedMaps : 120 ** headroom : 756 ( when your map /reduce task needs only 512mb) * then today it simply hangs because it thinks that there is sufficient room to launch one more mapper and therefore there is no need to ramp down. However, if this continues forever then this is not the correct way / optimal way. * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 and there are running reducers around should wait for certain time ( upper limited by average map task completion time ... for heuristic sake)..but after that if still it doesn't get new container for map task then should preempt the reducer one by one with some interval and should ramp up slowly... ** Preemption of reducer can be done in little smarter way *** preempt reducer on a node manager for which there is any pending map request. *** otherwise preempt any other reducer. MR AM will contribute to getting new mapper by releasing such a reducer / container because it will reduce its cluster consumption and thereby may become candidate for an allocation. > MapReduce reducer preemption gets hanged > > > Key: MAPREDUCE-5507 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Omkar Vinit Joshi >Assignee: Omkar Vinit Joshi > > Today if we are setting "yarn.app.mapreduce.am.job.reduce.rampup.limit" and > "mapreduce.job.reduce.slowstart.completedmaps" then reducers are launched > more aggressively. However the calculation to either Ramp up or Ramp down > reducer is not done in most optimal way. > * If MR AM at any point sees situation something like > ** scheduledMaps : 30 > ** scheduledReducers : 10 > ** assignedMaps : 0 > ** assignedReducers : 11 > ** finishedMaps : 120 > ** headroom : 756 ( when your map /reduce task needs only 512mb) > * then today it simply hangs because it thinks that there is sufficient room > to launch one more mapper and therefore there is no need to ramp down. > However, if this continues forever then this is not the correct way / optimal > way. > * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 > and there are running reducers around then it should wait for certain time ( > upper limited by average map task completion time ... for heuristic > sake)..but after that if still it doesn't get new container for map task then > it should preempt the reducer one by one with some interval and should ramp > up slowly... > ** Preemption of reducers can be done in little smarter way > *** preempt reducer on a node manager for which there is any pending map > request. > *** otherwise preempt any other reducer. MR AM will contribute to getting new > mapper by r
[jira] [Assigned] (MAPREDUCE-5481) TestUberAM timeout
[ https://issues.apache.org/jira/browse/MAPREDUCE-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuan Gong reassigned MAPREDUCE-5481: Assignee: Xuan Gong > TestUberAM timeout > -- > > Key: MAPREDUCE-5481 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5481 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, test >Affects Versions: 3.0.0 >Reporter: Jason Lowe >Assignee: Xuan Gong >Priority: Blocker > > TestUberAM has been timing out on trunk for some time now and surefire then > fails the build. I'm not able to reproduce it locally, but the Jenkins > builds have been seeing it fairly consistently. See > https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1529/console -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-5514) TestRMContainerAllocator fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen reassigned MAPREDUCE-5514: -- Assignee: Zhijie Shen > TestRMContainerAllocator fails on trunk > --- > > Key: MAPREDUCE-5514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5514 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Assignee: Zhijie Shen >Priority: Blocker > Attachments: > org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator-output.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Description: Since there is no reducer, the memory allocated to reducer is irrelevant to enable uber mode of a job > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
Siqi Li created MAPREDUCE-5517: -- Summary: enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb Key: MAPREDUCE-5517 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.0.5-alpha Reporter: Siqi Li Priority: Minor Attachments: MAPREDUCE_5517_v1.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Status: Patch Available (was: Open) > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated MAPREDUCE-5517: --- Attachment: MAPREDUCE_5517_v1.patch.txt Patch available > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb
[ https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772156#comment-13772156 ] Hadoop QA commented on MAPREDUCE-5517: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604080/MAPREDUCE_5517_v1.patch.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The following test timeouts occurred in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app: org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4016//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4016//console This message is automatically generated. > enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb > to be less than yarn.app.mapreduce.am.resource.mb > - > > Key: MAPREDUCE-5517 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.0.5-alpha >Reporter: Siqi Li >Priority: Minor > Attachments: MAPREDUCE_5517_v1.patch.txt > > > Since there is no reducer, the memory allocated to reducer is irrelevant to > enable uber mode of a job -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772117#comment-13772117 ] Vinod Kumar Vavilapalli commented on MAPREDUCE-5502: bq. I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED. +1 for this. That is what I was pushing for before YARN was Apache YARN. We can definitely print on the CLI that apps may get stuck after this, so that we suggest users to use "yarn application -kill" in those corner cases. > History link in resource manager is broken for KILLED jobs > -- > > Key: MAPREDUCE-5502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Vrushali C >Assignee: Vrushali C > Labels: ui > > History link in resource manager is broken for KILLED jobs. > Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If > the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link > is fine. > It isn't easy to reproduce the problem since the time at which the app is > killed determines the state it ends up in, which is hard to guess. these > particular jobs seem to get a Diagnostics message of "Application killed by > user." where as the other killed jobs get " Kill Job received from client > job_1378766187901_0002 > Job received Kill while in RUNNING state. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5516) TestMRJobClient fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772084#comment-13772084 ] Jian He commented on MAPREDUCE-5516: right.. thanks, close as a dup > TestMRJobClient fails on trunk > -- > > Key: MAPREDUCE-5516 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5516 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-5516) TestMRJobClient fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He resolved MAPREDUCE-5516. Resolution: Duplicate > TestMRJobClient fails on trunk > -- > > Key: MAPREDUCE-5516 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5516 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5514) TestRMContainerAllocator fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-5514: --- Priority: Blocker (was: Major) > TestRMContainerAllocator fails on trunk > --- > > Key: MAPREDUCE-5514 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5514 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Zhijie Shen >Priority: Blocker > Attachments: > org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator-output.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob
[ https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772073#comment-13772073 ] Xi Fang commented on MAPREDUCE-5508: I set both staging and system dirs to hdfs on my test cluster. I ran 35,000 job submissions and manually checked the number of DistributedFileSystem objects. No memory leak related to DistributedFileSystem was found. > JobTracker memory leak caused by unreleased FileSystem objects in > JobInProgress#cleanupJob > -- > > Key: MAPREDUCE-5508 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 1-win, 1.2.1 >Reporter: Xi Fang >Assignee: Xi Fang >Priority: Critical > Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.patch > > > MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem > object (see "tempDirFs") that is not properly released. > {code} JobInProgress#cleanupJob() > void cleanupJob() { > ... > tempDirFs = jobTempDirPath.getFileSystem(conf); > CleanupQueue.getInstance().addToQueue( > new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId)); > ... > if (tempDirFs != fs) { > try { > fs.close(); > } catch (IOException ie) { > ... > } > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5516) TestMRJobClient fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-5516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771943#comment-13771943 ] Jason Lowe commented on MAPREDUCE-5516: --- Dup of MAPREDUCE-5503? > TestMRJobClient fails on trunk > -- > > Key: MAPREDUCE-5516 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5516 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Jian He > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771891#comment-13771891 ] Jason Lowe commented on MAPREDUCE-5502: --- bq. From our investigation, it appears that the client's kill sends a KILL to the App Master as well as to the RM for this App. Were you actually seeing the AM shutdown due to the SIGTERM it would receive as part of the YARN kill, or did you see "Kill job received from client " in the AM logs as well? I see in YARNRunner.killJob that it can send a kill to the AM and later to the RM if for 10 seconds the AM doesn't end up in the KILLED state. That, too, seems to be a bug, since it really should be checking not for state != KILLED but rather for state not in a terminal state, i.e.: FAILED, KILLED, SUCCEEDED. Otherwise there's a race where the AM can enter a terminal state on its own but the code later tries to kill it via YARN anyway. bq. Similar to the patch in MAPREDUCE-5497, in YarnRunner's killJob function, we added a sleep for a few seconds before the (2nd) call to resMgrDelegate.killApplication where status.getState() != JobStatus.State.KILLED In general I'm not a fan of sleeps as a "fix" since they're just masking a race window rather than resolving the underlying condition. Sleeps also slow down the process in general, and it would be better to solve it without them if possible. Also MAPREDUCE-5497 didn't add a sleep, rather it moved an existing sleep to later in the AM shutdown process. That sleep is simply there for the AM to linger around for clients to fetch the final job status rather than redirect to the history server. I'm not sure it's necessary anymore, actually. I think a better fix would be to change YARNRunner.killJob to avoid sending a kill to the RM if the reported job state is terminal rather than just checking for KILLED. > History link in resource manager is broken for KILLED jobs > -- > > Key: MAPREDUCE-5502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager >Affects Versions: 2.0.5-alpha >Reporter: Vrushali C >Assignee: Vrushali C > Labels: ui > > History link in resource manager is broken for KILLED jobs. > Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If > the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link > is fine. > It isn't easy to reproduce the problem since the time at which the app is > killed determines the state it ends up in, which is hard to guess. these > particular jobs seem to get a Diagnostics message of "Application killed by > user." where as the other killed jobs get " Kill Job received from client > job_1378766187901_0002 > Job received Kill while in RUNNING state. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771878#comment-13771878 ] Hudson commented on MAPREDUCE-5487: --- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1527 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1527/]) MAPREDUCE-5487. In task processes, JobConf is unnecessarily loaded again in Limits (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524408) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/Limits.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestCounters.java > In task processes, JobConf is unnecessarily loaded again in Limits > -- > > Key: MAPREDUCE-5487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance, task >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.3.0 > > Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch > > > Limits statically loads a JobConf, which incurs costs of reading files from > disk and parsing XML. The contents of this JobConf are identical to the one > loaded by YarnChild (before adding job.xml as a resource). Allowing Limits > to initialize with the JobConf loaded in YarnChild would reduce task startup > time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771868#comment-13771868 ] Hudson commented on MAPREDUCE-5487: --- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1553 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1553/]) MAPREDUCE-5487. In task processes, JobConf is unnecessarily loaded again in Limits (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524408) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/Limits.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestCounters.java > In task processes, JobConf is unnecessarily loaded again in Limits > -- > > Key: MAPREDUCE-5487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance, task >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.3.0 > > Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch > > > Limits statically loads a JobConf, which incurs costs of reading files from > disk and parsing XML. The contents of this JobConf are identical to the one > loaded by YarnChild (before adding job.xml as a resource). Allowing Limits > to initialize with the JobConf loaded in YarnChild would reduce task startup > time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771791#comment-13771791 ] Hudson commented on MAPREDUCE-5487: --- SUCCESS: Integrated in Hadoop-Yarn-trunk #337 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/337/]) MAPREDUCE-5487. In task processes, JobConf is unnecessarily loaded again in Limits (Sandy Ryza) (sandy: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1524408) * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/YarnChild.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Counters.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/Limits.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestCounters.java > In task processes, JobConf is unnecessarily loaded again in Limits > -- > > Key: MAPREDUCE-5487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance, task >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.3.0 > > Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch > > > Limits statically loads a JobConf, which incurs costs of reading files from > disk and parsing XML. The contents of this JobConf are identical to the one > loaded by YarnChild (before adding job.xml as a resource). Allowing Limits > to initialize with the JobConf loaded in YarnChild would reduce task startup > time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira