[jira] [Updated] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-3787: -- Attachment: MAPREDUCE-3787-v1.12.patch Attaching a patch with more enhancements and documentation. Adding extra DEBUG logs. test-patch and JUnit tests passed. [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214524#comment-13214524 ] Ravi Gummadi commented on MAPREDUCE-3787: - Patch looks good to me. +1 [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2722) Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used
[ https://issues.apache.org/jira/browse/MAPREDUCE-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214532#comment-13214532 ] Amar Kamat commented on MAPREDUCE-2722: --- Changes look good to me. +1. Is it possible to add a JUnit? Gridmix simulated job's map's hdfsBytesRead counter is wrong when compressed input is used -- Key: MAPREDUCE-2722 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2722 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: MR2722.patch When compressed input was used by original job's map task, then the simulated job's map task's hdfsBytesRead counter is wrong if compression emulation is enabled. This issue is because hdfsBytesRead of map task of original job is considered as uncompressed map input size by Gridmix. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214542#comment-13214542 ] Hudson commented on MAPREDUCE-3787: --- Integrated in Hadoop-Hdfs-trunk-Commit #1840 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1840/]) MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for faster job submission. (amarrk) (Revision 1292736) Result = SUCCESS amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214545#comment-13214545 ] Hudson commented on MAPREDUCE-3787: --- Integrated in Hadoop-Common-trunk-Commit #1766 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1766/]) MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for faster job submission. (amarrk) (Revision 1292736) Result = SUCCESS amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat resolved MAPREDUCE-3787. --- Resolution: Fixed Release Note: JobMonitor can now deploy multiple threads for faster job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of threads. Stress mode now relies on the updates from the job monitor instead of polling for job status. Failures in job submission now get reported to the statistics module and ultimately reported to the user via summary. Hadoop Flags: Reviewed I just committed this to trunk! Thanks Ravi for the review. [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3829) [Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given
[ https://issues.apache.org/jira/browse/MAPREDUCE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214551#comment-13214551 ] Amar Kamat commented on MAPREDUCE-3829: --- Ravi, Should we reuse the 'STARTUP_FAILED_ERROR' in DistributedCacheEmulator? LOG statements should point to the real cause of the error. Lets try to keep all the error codes in one place i.e Gridmix.java. Other changes looks good to me. [Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given - Key: MAPREDUCE-3829 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3829 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix Reporter: Ravi Gummadi Assignee: Ravi Gummadi Attachments: 3829.v0.patch Instead of throwing exception messages on to the console, Gridmix should give better error message when input-data directory already exists and -generate option is given. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3728) ShuffleHandler can't access results when configured in a secure mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214553#comment-13214553 ] Hadoop QA commented on MAPREDUCE-3728: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515738/MAPREDUCE-3728.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1914//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1914//console This message is automatically generated. ShuffleHandler can't access results when configured in a secure mode Key: MAPREDUCE-3728 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3728 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.0 Reporter: Roman Shaposhnik Assignee: Ahmed Radwan Priority: Critical Fix For: 0.23.1 Attachments: MAPREDUCE-3728.patch While running the simplest of jobs (Pi) on MR2 in a fully secure configuration I have noticed that the job was failing on the reduce side with the following messages littering the nodemanager logs: {noformat} 2012-01-19 08:35:32,544 ERROR org.apache.hadoop.mapred.ShuffleHandler: Shuffle error org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find usercache/rvs/appcache/application_1326928483038_0001/output/attempt_1326928483038_0001_m_03_0/file.out.index in any of the configured local directories {noformat} While digging further I found out that the permissions on the files/dirs were prohibiting nodemanager (running under the user yarn) to access these files: {noformat} $ ls -l /data/3/yarn/usercache/testuser/appcache/application_1327102703969_0001/output/attempt_1327102703969_0001_m_01_0 -rw-r- 1 testuser testuser 28 Jan 20 15:41 file.out -rw-r- 1 testuser testuser 32 Jan 20 15:41 file.out.index {noformat} Digging even further revealed that the group-sticky bit that was faithfully put on all the subdirectories between testuser and application_1327102703969_0001 was gone from output and attempt_1327102703969_0001_m_01_0. Looking into how these subdirectories are created (org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs()) {noformat} // $x/usercache/$user/appcache/$appId/filecache Path appFileCacheDir = new Path(appBase, FILECACHE); appsFileCacheDirs[i] = appFileCacheDir.toString(); lfs.mkdir(appFileCacheDir, null, false); // $x/usercache/$user/appcache/$appId/output lfs.mkdir(new Path(appBase, OUTPUTDIR), null, false); {noformat} Reveals that lfs.mkdir ends up manipulating permissions and thus clears sticky bit from output and filecache. At this point I'm at a loss about how this is supposed to work. My understanding was that the whole sequence of events here was predicated on a sticky bit set so that daemons running under the user yarn (default group yarn) can have access to the resulting files and subdirectories down at output and below. Please let me know if I'm missing something or whether this is just a bug that needs to be fixed. On a related note, when the shuffle side of the Pi job failed the job itself didn't. It went into the endless loop and only exited when it exhausted all the local storage for the log files (at which point the nodemanager died and thus the job ended). Perhaps this is even more serious side effect of this issue that needs to be investigated separately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214559#comment-13214559 ] Hudson commented on MAPREDUCE-3787: --- Integrated in Hadoop-Mapreduce-trunk-Commit #1777 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1777/]) MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for faster job submission. (amarrk) (Revision 1292736) Result = ABORTED amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214584#comment-13214584 ] Hudson commented on MAPREDUCE-3787: --- Integrated in Hadoop-Hdfs-trunk #964 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/964/]) MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for faster job submission. (amarrk) (Revision 1292736) Result = SUCCESS amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java * /hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml [Gridmix] Improve STRESS mode - Key: MAPREDUCE-3787 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787 Project: Hadoop Map/Reduce Issue Type: Improvement Components: contrib/gridmix Affects Versions: 0.24.0 Reporter: Amar Kamat Assignee: Amar Kamat Labels: gridmix, stress Fix For: 0.23.1 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch Gridmix STRESS mode can be improved as follows: 1. The sleep time in JobMonitor can be reduced and/or made configurable 2. Map and reduce load calculation in StressJobFactory can be done in one loop 3. Updating the overload status from the job submitter thread (inline) 4. Optimizations to avoid un-necessary progress check (which inturn would result into delay) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214582#comment-13214582 ] Hudson commented on MAPREDUCE-3884: --- Integrated in Hadoop-Hdfs-trunk #964 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/964/]) MAPREDUCE-3884. PWD should be first in the classpath of MR tasks (tucu) (Revision 1292424) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292424 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java PWD should be first in the classpath of MR tasks Key: MAPREDUCE-3884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.2 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch Currently the current directory is not part of the classpath, this is a regression from MR1 and existing applications assuming this fail to work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214589#comment-13214589 ] Hudson commented on MAPREDUCE-3884: --- Integrated in Hadoop-Hdfs-0.23-Build #177 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/177/]) Merge -r 1292423:1292424 from trunk to branch. FIXES: MAPREDUCE-3884 (Revision 1292427) Result = UNSTABLE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292427 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java PWD should be first in the classpath of MR tasks Key: MAPREDUCE-3884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.2 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch Currently the current directory is not part of the classpath, this is a regression from MR1 and existing applications assuming this fail to work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214620#comment-13214620 ] Hudson commented on MAPREDUCE-3884: --- Integrated in Hadoop-Mapreduce-0.23-Build #205 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/205/]) Merge -r 1292423:1292424 from trunk to branch. FIXES: MAPREDUCE-3884 (Revision 1292427) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292427 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java PWD should be first in the classpath of MR tasks Key: MAPREDUCE-3884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.2 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch Currently the current directory is not part of the classpath, this is a regression from MR1 and existing applications assuming this fail to work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214645#comment-13214645 ] Hudson commented on MAPREDUCE-3884: --- Integrated in Hadoop-Mapreduce-trunk #999 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/999/]) MAPREDUCE-3884. PWD should be first in the classpath of MR tasks (tucu) (Revision 1292424) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292424 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java PWD should be first in the classpath of MR tasks Key: MAPREDUCE-3884 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.2 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch Currently the current directory is not part of the classpath, this is a regression from MR1 and existing applications assuming this fail to work properly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3034) NM should act on a REBOOT command from RM
[ https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214790#comment-13214790 ] Eric Payne commented on MAPREDUCE-3034: --- @Devaraj, Can you please upmerge the patch to the latest code in http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23 Thanks! NM should act on a REBOOT command from RM - Key: MAPREDUCE-3034 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.0, 0.24.0 Reporter: Vinod Kumar Vavilapalli Assignee: Devaraj K Priority: Critical Attachments: MAPREDUCE-3034-1.patch, MAPREDUCE-3034-2.patch, MAPREDUCE-3034-3.patch, MAPREDUCE-3034-4.patch, MAPREDUCE-3034.patch, MR-3034.txt RM sends a reboot command to NM in some cases, like when it gets lost and rejoins back. In such a case, NM should act on the command and reboot/reinitalize itself. This is akin to TT reinitialize on order from JT. We will need to shutdown all the services properly and reinitialize - this should automatically take care of killing of containers, cleaning up local temporary files etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3902) MR AM should reuse containers for map tasks
MR AM should reuse containers for map tasks --- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Arun C Murthy The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3897) capacity scheduler - maxActiveApplicationsPerUser calculation can be wrong
[ https://issues.apache.org/jira/browse/MAPREDUCE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Payne reassigned MAPREDUCE-3897: - Assignee: Eric Payne capacity scheduler - maxActiveApplicationsPerUser calculation can be wrong -- Key: MAPREDUCE-3897 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3897 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Thomas Graves Assignee: Eric Payne Priority: Critical The capacity scheduler calculates the maxActiveApplications and the maxActiveApplicationsPerUser based on the config yarn.scheduler.capacity.maximum-applications or default 1. MaxActiveApplications = max ( ceil ( clusterMemory/minAllocation * maxAMResource% * absoluteMaxCapacity), 1) MaxActiveAppsPerUser = max( ceil (maxActiveApplicationsComputedAbove * (userLimit%/100) * userLimitFactor), 1) maxActiveApplications is already multiplied by the queue absolute MAXIMUM capacity, so if max capacity capacity and if you have user limit factor 1 (which is the default) and only 1 user is running, that user will not be allowed to use over the queue capacity, so having it relative to MAX capacity doesn't make sense. That user could easily end up in a deadlock and all its space used by application masters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3902) MR AM should reuse containers for map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-3902: - Attachment: MAPREDUCE-3902.patch Ok, I spent a long (isolated) flight on this - it clearly needs more work, but it's a start. *smile* This patch improves the classic JVM re-use on both dimensions described in the jira. We need to pay more attention to the user interface, some options: # Allow user to specify actual number of map slots to be used (supported now, in the patch) # Allow user to specify a target block-size for maps (which is greater than real HDFS block size) i.e. get around the small-files problem. Thoughts? MR AM should reuse containers for map tasks --- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214807#comment-13214807 ] Thomas Graves commented on MAPREDUCE-3878: -- +1 looks good. Thanks Jon. Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated MAPREDUCE-3878: - Resolution: Fixed Fix Version/s: 0.23.2 Status: Resolved (was: Patch Available) I committed this to trunk and branch-0.23. Thanks Jon! Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214819#comment-13214819 ] Jay Finger commented on MAPREDUCE-3902: --- I haven't read the patch, forgive me if the answer is already there. Is there a cap on the amount of re-use? For example, if the container has been in use for more than 1 minute then do not re-use it. Or to rephrase, what prevents a cluster with a few large jobs from having hogged containers? MR AM should reuse containers for map tasks --- Key: MAPREDUCE-3902 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902 Project: Hadoop Map/Reduce Issue Type: Improvement Components: applicationmaster, mrv2 Reporter: Arun C Murthy Assignee: Arun C Murthy Attachments: MAPREDUCE-3902.patch The MR AM is now in a great position to reuse containers across (map) tasks. This is something similar to JVM re-use we had in 0.20.x, but in a significantly better manner: # Consider data-locality when re-using containers # Consider the new shuffle - ensure that reduces fetch output of the whole container at once (i.e. all maps) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214823#comment-13214823 ] Hudson commented on MAPREDUCE-3878: --- Integrated in Hadoop-Common-0.23-Commit #586 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/586/]) merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 (Revision 1292834) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214824#comment-13214824 ] Hudson commented on MAPREDUCE-3878: --- Integrated in Hadoop-Common-trunk-Commit #1767 (See [https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1767/]) MAPREDUCE-3878. Null user on filtered jobhistory job page (Jonathon Eagles via tgraves) (Revision 1292831) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292831 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214821#comment-13214821 ] Hudson commented on MAPREDUCE-3878: --- Integrated in Hadoop-Hdfs-trunk-Commit #1841 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1841/]) MAPREDUCE-3878. Null user on filtered jobhistory job page (Jonathon Eagles via tgraves) (Revision 1292831) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292831 Files : * /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214826#comment-13214826 ] Hudson commented on MAPREDUCE-3878: --- Integrated in Hadoop-Hdfs-0.23-Commit #573 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/573/]) merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 (Revision 1292834) Result = SUCCESS tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page
[ https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214844#comment-13214844 ] Hudson commented on MAPREDUCE-3878: --- Integrated in Hadoop-Mapreduce-0.23-Commit #588 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/588/]) merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 (Revision 1292834) Result = ABORTED tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java Null user on filtered jobhistory job page - Key: MAPREDUCE-3878 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3878.patch If jobhistory/job.* is filtered to bypass acl, resulting page will always show Null user. This differs from 0.20 where filtering on this page, bypasses security to allow all access to the page. essentially passes a null user to AppController where an exception is thrown. If a null user is detected, we should acl checking is disabled on this page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3904) Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true - Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3872) event handling races in ContainerLauncherImpl and TestContainerLauncher
[ https://issues.apache.org/jira/browse/MAPREDUCE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated MAPREDUCE-3872: Attachment: MAPREDUCE-3872.patch Refreshing the patch. Looks like MAPREDUCE-3634 fixed a number of the issues I had originally seen/fixed in this patch. The latest version of this patch fixes the obvious concurrency bug in updating allNodes. This patch is currently tested by the unit tests, I don't see a way to trigger the bad case given it's non-deterministic. However by inspection you can see the obvious concurrency bug that exists in the current code. event handling races in ContainerLauncherImpl and TestContainerLauncher --- Key: MAPREDUCE-3872 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3872 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.1 Reporter: Patrick Hunt Attachments: MAPREDUCE-3872.patch, MAPREDUCE-3872.patch TestContainerLauncher is failing intermittently for me. {noformat} junit.framework.AssertionFailedError: Expected: null but was: Expected 22 but found 21 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertNull(Assert.java:233) at junit.framework.Assert.assertNull(Assert.java:226) at org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher.testPoolSize(TestContainerLauncher.java:117) {noformat} Patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214957#comment-13214957 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583: --- I got the following when running ant test-patch: Sorry that I was not clear. The full command looks like {code} ant -Dforrest.home=${FORREST_HOME} -Dfindbugs.home=${FINDBUGS_HOME} -Dpatch.file=a.patch test-patch {code} and it requires findbugs and forrest. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Status: Open (was: Patch Available) lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Status: Patch Available (was: Open) lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Attachment: MR3901_v2.txt Updated to fix the very valid findbug warnings. lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3905) Allow per job log aggregation configuration
Allow per job log aggregation configuration --- Key: MAPREDUCE-3905 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3905 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Currently, if log aggregation is enabled for a cluster - logs for all jobs will be aggregated - leading to a whole bunch of files on hdfs which users may not want. Users should be able to control this along with the aggregation policy - failed only, all, etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3872) event handling races in ContainerLauncherImpl and TestContainerLauncher
[ https://issues.apache.org/jira/browse/MAPREDUCE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214987#comment-13214987 ] Hadoop QA commented on MAPREDUCE-3872: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515773/MAPREDUCE-3872.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1915//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1915//console This message is automatically generated. event handling races in ContainerLauncherImpl and TestContainerLauncher --- Key: MAPREDUCE-3872 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3872 Project: Hadoop Map/Reduce Issue Type: Bug Components: client, mrv2 Affects Versions: 0.23.1 Reporter: Patrick Hunt Attachments: MAPREDUCE-3872.patch, MAPREDUCE-3872.patch TestContainerLauncher is failing intermittently for me. {noformat} junit.framework.AssertionFailedError: Expected: null but was: Expected 22 but found 21 at junit.framework.Assert.fail(Assert.java:47) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertNull(Assert.java:233) at junit.framework.Assert.assertNull(Assert.java:226) at org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher.testPoolSize(TestContainerLauncher.java:117) {noformat} Patch momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3614) 55
[ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated MAPREDUCE-3614: Summary: 55 (was: finalState UNDEFINED if AM is killed by hand) 55 -- Key: MAPREDUCE-3614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-3614.branch-0.23.patch Courtesy [~dcapwell] {quote} If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would be FINISHED and FinalStatus is UNDEFINED. The Tracking UI would say History and point to the proxy url (which will redirect to the history server). The state should be more descriptive that the job failed and the tracker url shouldn't point to the history server. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-3738: -- Attachment: MAPREDUCE-3738.patch Patch to ensure we always set the finished boolean in the log aggregation thread. On a side note we haven't seen a reoccurrence of the OOM condition on the nodemanager, so we haven't been able to track down what caused it. NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-3738: -- Target Version/s: 0.24.0, 0.23.2 Status: Patch Available (was: Open) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215033#comment-13215033 ] Hadoop QA commented on MAPREDUCE-3901: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515794/MR3901_v2.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-hs.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//console This message is automatically generated. lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215047#comment-13215047 ] Zhihong Yu commented on MAPREDUCE-3583: --- I installed forrest and findbugs onto MacBook. {code} /Users/zhihyu/205-hadoop/build.xml:1310: 'java5.home' is not defined. Forrest requires Java 5. Please pass -Djava5.home=base of Java 5 distribution to Ant on the command-line. {code} Still need to install java 5. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand
[ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hitesh Shah updated MAPREDUCE-3614: --- Summary: finalState UNDEFINED if AM is killed by hand (was: 55) finalState UNDEFINED if AM is killed by hand - Key: MAPREDUCE-3614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-3614.branch-0.23.patch Courtesy [~dcapwell] {quote} If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would be FINISHED and FinalStatus is UNDEFINED. The Tracking UI would say History and point to the proxy url (which will redirect to the history server). The state should be more descriptive that the job failed and the tracker url shouldn't point to the history server. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Status: Open (was: Patch Available) lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215052#comment-13215052 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583: --- But I have to manually remove cn-doc dependency for using Java 6. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215051#comment-13215051 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583: --- I put java 6 for the java5.home and it works. So you don't really have to install Java 5. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215055#comment-13215055 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583: --- Okay, I just have run ant test-patch on mapreduce-3583-v7.txt. {noformat} [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] -1 tests included. The patch doesn't appear to include any new or modified tests. [exec] Please justify why no tests are needed for this patch. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] -1 findbugs. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. [exec] {noformat} The findbugs warnings are not related. The result is the same if running test-patch with an empty patch. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
[jira] [Assigned] (MAPREDUCE-3903) no admin override to view jobs on mr app master and job history server
[ https://issues.apache.org/jira/browse/MAPREDUCE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned MAPREDUCE-3903: Assignee: Thomas Graves no admin override to view jobs on mr app master and job history server -- Key: MAPREDUCE-3903 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3903 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Thomas Graves Assignee: Thomas Graves Priority: Critical Fix For: 0.23.0 in 1.0 there was a config mapreduce.cluster.administrators that allowed administrators to view anyones job. That no longer works on yarn. yarn has the new config yarn.admin.acl but it appears the mr app master and job history server don't use that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
Fix inconsistency in documentation regarding mapreduce.jobhistory.principal --- Key: MAPREDUCE-3906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Trivial Currently the documentation on http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode is inconsistent on the recommended or default value of {{{mapreduce.jobhistory.principal}}}. In the section with the header: MapReduce JobHistory Server the principal jhs/... is used, but later, in the section with the header: Configurations for MapReduce JobHistory Server:, the principal mapred/... is used. Fix is to replace mapred/... with jhs/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
[ https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated MAPREDUCE-3906: - Component/s: security Fix inconsistency in documentation regarding mapreduce.jobhistory.principal --- Key: MAPREDUCE-3906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Trivial Currently the documentation on http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode is inconsistent on the recommended or default value of {{{mapreduce.jobhistory.principal}}}. In the section with the header: MapReduce JobHistory Server the principal jhs/... is used, but later, in the section with the header: Configurations for MapReduce JobHistory Server:, the principal mapred/... is used. Fix is to replace mapred/... with jhs/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215058#comment-13215058 ] Zhihong Yu commented on MAPREDUCE-3583: --- Turns out java 5 was installed. Here is the command I used: {code} ant -Dforrest.home=${FORREST_HOME} -Dfindbugs.home=${FINDBUGS_HOME} -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home -Dpatch.file=../mapreduce-3583-v7.txt test-patch {code} I got: {code} [get] Error opening connection java.io.IOException: Server returned HTTP response code: 503 for URL: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] Can't get http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to /Users/zhihyu/205-hadoop/ivy/ivy-2.1.0.jar BUILD FAILED /Users/zhihyu/205-hadoop/build.xml:2393: Can't get http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to /Users/zhihyu/205-hadoop/ivy/ivy-2.1.0.jar {code} Not sure if the above was caused by firewall. ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From
[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
[ https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated MAPREDUCE-3906: - Attachment: MAPREDUCE-3906.patch Fix inconsistency in documentation regarding mapreduce.jobhistory.principal --- Key: MAPREDUCE-3906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Trivial Attachments: MAPREDUCE-3906.patch Currently the documentation on http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode is inconsistent on the recommended or default value of {{{mapreduce.jobhistory.principal}}}. In the section with the header: MapReduce JobHistory Server the principal jhs/... is used, but later, in the section with the header: Configurations for MapReduce JobHistory Server:, the principal mapred/... is used. Fix is to replace mapred/... with jhs/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Status: Patch Available (was: Open) lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt, MR3901_v3.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated MAPREDUCE-3901: -- Attachment: MR3901_v3.txt trying again.. the previous patch should've been ok. lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt, MR3901_v3.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
[ https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-3906: - Component/s: mrv2 Fix inconsistency in documentation regarding mapreduce.jobhistory.principal --- Key: MAPREDUCE-3906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Trivial Attachments: MAPREDUCE-3906.patch Currently the documentation on http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode is inconsistent on the recommended or default value of {{{mapreduce.jobhistory.principal}}}. In the section with the header: MapReduce JobHistory Server the principal jhs/... is used, but later, in the section with the header: Configurations for MapReduce JobHistory Server:, the principal mapred/... is used. Fix is to replace mapred/... with jhs/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215077#comment-13215077 ] Hadoop QA commented on MAPREDUCE-3738: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515804/MAPREDUCE-3738.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//console This message is automatically generated. NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.
Create a mapred-default.xml for the jobhistory server. -- Key: MAPREDUCE-3907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Minor The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml that documents these and provides default values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3889) job client tries to use /tasklog interface, but that doesn't exist anymore
[ https://issues.apache.org/jira/browse/MAPREDUCE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215097#comment-13215097 ] Thomas Graves commented on MAPREDUCE-3889: -- {quote} What is the impact of this? Is it crashing the client? Seems like it from the code, in which case we'll need to fix it. {quote} This is not crashing the client. It just prints the 400 message out on the client if they had a failed task (by default) or task with status by what they set -Dmapreduce.client.output.filter to. 400 message look like: 12/02/18 21:32:12 WARN mapreduce.Job: Error reading task output Server returned HTTP response code: 400 for URL: http://nodemanager:8080/tasklog?plaintext=trueattemptid=attempt_1329857083014_0003_r_00_0filter=stdout So as far as I can tell its benign - just possibly confusing to the user and its not actually giving them any of the log information for failed tasks. job client tries to use /tasklog interface, but that doesn't exist anymore -- Key: MAPREDUCE-3889 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3889 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Thomas Graves Priority: Critical if you specify -Dmapreduce.client.output.filter=SUCCEEDED option when running a job it tries to fetch task logs to print out on the client side from a url like: http://nodemanager:8080/tasklog?plaintext=trueattemptid=attempt_1329857083014_0003_r_00_0filter=stdout It always errors on this request with: Required param job, map and reduce We saw this error when using distcp and the distcp failed. I'm not sure if it is mandatory for distcp or just informational purposes. I'm guessing the latter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated MAPREDUCE-3907: - Description: The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a {{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}} that documents these properties and provides default values. was: The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a {{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}} that documents these properties and provides default values. Create a mapred-default.xml for the jobhistory server. -- Key: MAPREDUCE-3907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Minor Attachments: MAPREDUCE-3907.patch The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a {{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}} that documents these properties and provides default values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated MAPREDUCE-3907: - Description: The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a {{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}} that documents these properties and provides default values. was: The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml that documents these and provides default values. Create a mapred-default.xml for the jobhistory server. -- Key: MAPREDUCE-3907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Minor Attachments: MAPREDUCE-3907.patch The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a {{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}} that documents these properties and provides default values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koontz updated MAPREDUCE-3907: - Attachment: MAPREDUCE-3907.patch Create a mapred-default.xml for the jobhistory server. -- Key: MAPREDUCE-3907 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Minor Attachments: MAPREDUCE-3907.patch The following configuration properties are documented in http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode * mapreduce.jobhistory.address * mapreduce.jobhistory.keytab * mapreduce.jobhistory.principal Create a hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml that documents these and provides default values. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location
jobhistory server trying to load job conf file from wrong location -- Key: MAPREDUCE-3908 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.23.0 Reporter: Thomas Graves I have seen a few instance where I try to click on the job configuration link from the job history server web ui and it gives a 500 message. Looking at the job history server log file it shows an exception like: 2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while reading hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml java.io.FileNotFoundException: File does not exist: /home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2 If I go look in hdfs, it doesn't exist in the done_intermediate directory anymore, it exists in the done directory structure. hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml I'm not exactly sure how to reproduce this, but I definitely see it every once in a while. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location
[ https://issues.apache.org/jira/browse/MAPREDUCE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215126#comment-13215126 ] Thomas Graves commented on MAPREDUCE-3908: -- I should also note that restarting the job history server makes the issue go away and it looks it from the right location in the done directory. jobhistory server trying to load job conf file from wrong location -- Key: MAPREDUCE-3908 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.23.0 Reporter: Thomas Graves I have seen a few instance where I try to click on the job configuration link from the job history server web ui and it gives a 500 message. Looking at the job history server log file it shows an exception like: 2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while reading hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml java.io.FileNotFoundException: File does not exist: /home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2 If I go look in hdfs, it doesn't exist in the done_intermediate directory anymore, it exists in the done directory structure. hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml I'm not exactly sure how to reproduce this, but I definitely see it every once in a while. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location
[ https://issues.apache.org/jira/browse/MAPREDUCE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215128#comment-13215128 ] Siddharth Seth commented on MAPREDUCE-3908: --- This happens when the job history file is initially read from the done_intermediate directory, and later moved over to the done directory. The cached CompletedJob object continues to hold a reference to the conf file in the intermediate directory. jobhistory server trying to load job conf file from wrong location -- Key: MAPREDUCE-3908 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1 Affects Versions: 0.23.0 Reporter: Thomas Graves I have seen a few instance where I try to click on the job configuration link from the job history server web ui and it gives a 500 message. Looking at the job history server log file it shows an exception like: 2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while reading hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml java.io.FileNotFoundException: File does not exist: /home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2 If I go look in hdfs, it doesn't exist in the done_intermediate directory anymore, it exists in the done directory structure. hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml I'm not exactly sure how to reproduce this, but I definitely see it every once in a while. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
[ https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215133#comment-13215133 ] Hadoop QA commented on MAPREDUCE-3906: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515819/MAPREDUCE-3906.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1918//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1918//console This message is automatically generated. Fix inconsistency in documentation regarding mapreduce.jobhistory.principal --- Key: MAPREDUCE-3906 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906 Project: Hadoop Map/Reduce Issue Type: Improvement Components: documentation, mrv2, security Reporter: Eugene Koontz Assignee: Eugene Koontz Priority: Trivial Attachments: MAPREDUCE-3906.patch Currently the documentation on http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode is inconsistent on the recommended or default value of {{{mapreduce.jobhistory.principal}}}. In the section with the header: MapReduce JobHistory Server the principal jhs/... is used, but later, in the section with the header: Configurations for MapReduce JobHistory Server:, the principal mapred/... is used. Fix is to replace mapred/... with jhs/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details
[ https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215134#comment-13215134 ] Hadoop QA commented on MAPREDUCE-3901: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515820/MR3901_v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1919//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1919//console This message is automatically generated. lazy load JobHistory Task and TaskAttempt details - Key: MAPREDUCE-3901 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver, mrv2 Affects Versions: 0.23.0 Reporter: Siddharth Seth Assignee: Siddharth Seth Attachments: MR3901.txt, MR3901_v2.txt, MR3901_v3.txt The job history UI and MRClientProtocol calls routed via JobHistory are very slow for large jobs. Some of this time is spent parsing the history file. A good chunk is spent pre-creating lots of objects which may never be used. Those can be create when required - bringing down the load times of job history pages and getJobReport etc calls to approximately the history file parse time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-3792) job -list displays only the jobs submitted by a particular user
[ https://issues.apache.org/jira/browse/MAPREDUCE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe reassigned MAPREDUCE-3792: - Assignee: Jason Lowe (was: Vinod Kumar Vavilapalli) job -list displays only the jobs submitted by a particular user --- Key: MAPREDUCE-3792 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3792 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Ramya Sunil Assignee: Jason Lowe Priority: Critical mapred job -list lists only the jobs submitted by the user who ran the command. This behavior is different from 1.x. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215156#comment-13215156 ] Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583: --- I got two test failures ... Both tests passed on my machine and I don't think the failures your got are related to the patch. {noformat} [junit] Running org.apache.hadoop.hdfs.security.TestDelegationToken [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 15.793 sec [junit] Running org.apache.hadoop.metrics2.impl.TestSinkQueue [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.321 sec {noformat} ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 0.23.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing
[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated MAPREDUCE-3583: -- Resolution: Fixed Fix Version/s: 1.0.2 1.1.0 Status: Resolved (was: Patch Available) I also have committed to branch-1 and branch-1.0. Thanks Ted again! ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException - Key: MAPREDUCE-3583 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.205.0 Environment: 64-bit Linux: asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux Reporter: Zhihong Yu Assignee: Zhihong Yu Priority: Critical Fix For: 0.24.0, 1.1.0, 0.23.2, 1.0.2 Attachments: mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, mapreduce-3583-v7.txt, mapreduce-3583.txt HBase PreCommit builds frequently gave us NumberFormatException. From https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: {code} 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). java.lang.NumberFormatException: For input string: 18446743988060683582 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:422) at java.lang.Long.parseLong(Long.java:468) at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) at org.apache.hadoop.mapred.Task.initialize(Task.java:536) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.mapred.Child.main(Child.java:249) {code} From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, causing NFE: {code} // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), {code} You can find information on the OS at the beginning of https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: {code} asf011.sp2.ygridcore.net Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 17:42:25 UTC 2011 x86_64 GNU/Linux core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 6 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 2048 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited 6 Running in Jenkins mode {code} From Nicolas Sze: {noformat} It looks like that the ppid is a 64-bit positive integer but Java long is signed and so only works with 63-bit positive integers. In your case, 2^64 18446743988060683582 2^63. Therefore, there is a NFE. {noformat} I propose changing allProcessInfo to MapString, ProcessInfo so that we don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand
[ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated MAPREDUCE-3614: Attachment: MAPREDUCE-3614.patch Oops! I'm sorry! It seems my random comment generator malfunctioned :D Apologies. Thanks Hitesh! I'm uploading this patch which addresses our issues. I'll be adding unit tests to this, but in the meantime could some committer please bless it? finalState UNDEFINED if AM is killed by hand - Key: MAPREDUCE-3614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch Courtesy [~dcapwell] {quote} If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would be FINISHED and FinalStatus is UNDEFINED. The Tracking UI would say History and point to the proxy url (which will redirect to the history server). The state should be more descriptive that the job failed and the tracker url shouldn't point to the history server. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215202#comment-13215202 ] Bikas Saha commented on MAPREDUCE-2793: --- The hashCode difference was because the ApplicationId internal to JobId was different. The test creates 3 jobs with the same app id. However currently, having jobid == appid is baked into a lot of code including the one used to fix the inconsistency in names. The test would create a list of 3 jobs with id's 0,1,2 and app id =0. The it would fetch the all the jobs from the webserver, pick the first job and verify that it exists in its list. Hence when the new code in the webserver was used to generate the jobid from the jobid string, it returned a job id with app id equal to the job id. This job id would have a different app id than the one in the test list except for when the job id was 0. So when the first job in the list was job id 0 then the test would pass, and otherwise it would fail. The order in the list would change with each run because the list was a hash map. [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs -- Key: MAPREDUCE-2793 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch appIDs, jobIDs and attempt/container ids are not consistently named in the logs, console and UI. For consistency purpose, they all have to follow a common naming convention. Currently, For appID = On the RM UI: app_1308259676864_5 On the JHS UI: No appID Console/logs: No appID mapred-local dirs are named as: application_1308259676864_0005 For jobID = On the RM UI: job_1308259676864_5_5 JHS UI: job_1308259676864_5_5 Console/logs: job_1308259676864_0005 mapred-local dirs are named as: No jobID For attemptID On the RM UI: attempt_1308259676864_5_5_m_24_0 JHS attempt_1308259676864_5_5_m_24_0 Console/logs: attempt_1308259676864_0005_m_24_0 mapred-local dirs are named as: container_1308259676864_0005_24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated MAPREDUCE-2793: -- Status: Patch Available (was: Open) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs -- Key: MAPREDUCE-2793 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch appIDs, jobIDs and attempt/container ids are not consistently named in the logs, console and UI. For consistency purpose, they all have to follow a common naming convention. Currently, For appID = On the RM UI: app_1308259676864_5 On the JHS UI: No appID Console/logs: No appID mapred-local dirs are named as: application_1308259676864_0005 For jobID = On the RM UI: job_1308259676864_5_5 JHS UI: job_1308259676864_5_5 Console/logs: job_1308259676864_0005 mapred-local dirs are named as: No jobID For attemptID On the RM UI: attempt_1308259676864_5_5_m_24_0 JHS attempt_1308259676864_5_5_m_24_0 Console/logs: attempt_1308259676864_0005_m_24_0 mapred-local dirs are named as: container_1308259676864_0005_24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated MAPREDUCE-2793: -- Attachment: MAPREDUCE-2793-branch-0.23.patch Changed the test to have jobid==appid. The AppContext methods that are supposed to return appId for the AppContext return null for this TestAppContext so that it crashes deterministically if it gets used in the future. [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs -- Key: MAPREDUCE-2793 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch appIDs, jobIDs and attempt/container ids are not consistently named in the logs, console and UI. For consistency purpose, they all have to follow a common naming convention. Currently, For appID = On the RM UI: app_1308259676864_5 On the JHS UI: No appID Console/logs: No appID mapred-local dirs are named as: application_1308259676864_0005 For jobID = On the RM UI: job_1308259676864_5_5 JHS UI: job_1308259676864_5_5 Console/logs: job_1308259676864_0005 mapred-local dirs are named as: No jobID For attemptID On the RM UI: attempt_1308259676864_5_5_m_24_0 JHS attempt_1308259676864_5_5_m_24_0 Console/logs: attempt_1308259676864_0005_m_24_0 mapred-local dirs are named as: container_1308259676864_0005_24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated MAPREDUCE-2793: -- Status: Open (was: Patch Available) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs -- Key: MAPREDUCE-2793 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch appIDs, jobIDs and attempt/container ids are not consistently named in the logs, console and UI. For consistency purpose, they all have to follow a common naming convention. Currently, For appID = On the RM UI: app_1308259676864_5 On the JHS UI: No appID Console/logs: No appID mapred-local dirs are named as: application_1308259676864_0005 For jobID = On the RM UI: job_1308259676864_5_5 JHS UI: job_1308259676864_5_5 Console/logs: job_1308259676864_0005 mapred-local dirs are named as: No jobID For attemptID On the RM UI: attempt_1308259676864_5_5_m_24_0 JHS attempt_1308259676864_5_5_m_24_0 Console/logs: attempt_1308259676864_0005_m_24_0 mapred-local dirs are named as: container_1308259676864_0005_24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2942) TestNMAuditLogger.testNMAuditLoggerWithIP failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215218#comment-13215218 ] Hudson commented on MAPREDUCE-2942: --- Integrated in Hadoop-Hdfs-0.23-PB-Commit #2 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-PB-Commit/2/]) svn merge -c 1166842 from trunk for MAPREDUCE-2942. (Revision 1293033) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293033 Files : * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNMAuditLogger.java * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java TestNMAuditLogger.testNMAuditLoggerWithIP failing - Key: MAPREDUCE-2942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2942 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.24.0 Reporter: Vinod Kumar Vavilapalli Assignee: Thomas Graves Priority: Critical Fix For: 0.24.0 Attachments: audittest.patch, audittest2.patch This is failing right after the MAPREDUCE-2655 commit, but Jenkins did report a success when that patch was submitted. {code} Standard Output 2011-09-07 07:12:52,785 INFO ipc.Server (Server.java:run(349)) - Starting Socket Reader #1 for port 33000 2011-09-07 07:12:52,787 INFO ipc.Server (WritableRpcEngine.java:registerProtocolAndImpl(399)) - ProtocolImpl=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer protocolClass=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer version=1 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(642)) - IPC Server Responder: starting 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(473)) - IPC Server listener on 33000: starting 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(1459)) - IPC Server handler 0 on 33000: starting 2011-09-07 07:12:52,798 INFO ipc.Server (Server.java:run(1497)) - IPC Server handler 0 on 33000, call: ping(), rpc version=2, client version=1, methodsFingerPrint=-1968962669 from 67.195.138.31:33806, error: java.io.IOException: java.io.IOException: Unknown protocol: org.apache.hadoop.ipc.TestRPC$TestProtocol at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:622) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1485) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1483) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2942) TestNMAuditLogger.testNMAuditLoggerWithIP failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215221#comment-13215221 ] Hudson commented on MAPREDUCE-2942: --- Integrated in Hadoop-Common-0.23-PB-Commit #2 (See [https://builds.apache.org/job/Hadoop-Common-0.23-PB-Commit/2/]) svn merge -c 1166842 from trunk for MAPREDUCE-2942. (Revision 1293033) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293033 Files : * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNMAuditLogger.java * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java TestNMAuditLogger.testNMAuditLoggerWithIP failing - Key: MAPREDUCE-2942 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2942 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.24.0 Reporter: Vinod Kumar Vavilapalli Assignee: Thomas Graves Priority: Critical Fix For: 0.24.0 Attachments: audittest.patch, audittest2.patch This is failing right after the MAPREDUCE-2655 commit, but Jenkins did report a success when that patch was submitted. {code} Standard Output 2011-09-07 07:12:52,785 INFO ipc.Server (Server.java:run(349)) - Starting Socket Reader #1 for port 33000 2011-09-07 07:12:52,787 INFO ipc.Server (WritableRpcEngine.java:registerProtocolAndImpl(399)) - ProtocolImpl=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer protocolClass=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer version=1 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(642)) - IPC Server Responder: starting 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(473)) - IPC Server listener on 33000: starting 2011-09-07 07:12:52,788 INFO ipc.Server (Server.java:run(1459)) - IPC Server handler 0 on 33000: starting 2011-09-07 07:12:52,798 INFO ipc.Server (Server.java:run(1497)) - IPC Server handler 0 on 33000, call: ping(), rpc version=2, client version=1, methodsFingerPrint=-1968962669 from 67.195.138.31:33806, error: java.io.IOException: java.io.IOException: Unknown protocol: org.apache.hadoop.ipc.TestRPC$TestProtocol at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:622) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1485) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1483) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
[ https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215270#comment-13215270 ] Hadoop QA commented on MAPREDUCE-2793: -- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515848/MAPREDUCE-2793-branch-0.23.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1920//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1920//console This message is automatically generated. [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs -- Key: MAPREDUCE-2793 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch appIDs, jobIDs and attempt/container ids are not consistently named in the logs, console and UI. For consistency purpose, they all have to follow a common naming convention. Currently, For appID = On the RM UI: app_1308259676864_5 On the JHS UI: No appID Console/logs: No appID mapred-local dirs are named as: application_1308259676864_0005 For jobID = On the RM UI: job_1308259676864_5_5 JHS UI: job_1308259676864_5_5 Console/logs: job_1308259676864_0005 mapred-local dirs are named as: No jobID For attemptID On the RM UI: attempt_1308259676864_5_5_m_24_0 JHS attempt_1308259676864_5_5_m_24_0 Console/logs: attempt_1308259676864_0005_m_24_0 mapred-local dirs are named as: container_1308259676864_0005_24 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand
[ https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated MAPREDUCE-3614: Attachment: MAPREDUCE-3614.patch Discussed with Vinod and he told me that we should not drain the event queue in case of a SIGTERM in stop(). So I created a new shutdownhook that notifies the JHEH that SIGTERM had been called. I forgot to mention but thanks go to [~daryn] for helping me figure out a way to keep FileSystem objects open. :) finalState UNDEFINED if AM is killed by hand - Key: MAPREDUCE-3614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Ravi Prakash Assignee: Ravi Prakash Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch, MAPREDUCE-3614.patch Courtesy [~dcapwell] {quote} If the AM is running and you kill the process (sudo kill #pid), the State in Yarn would be FINISHED and FinalStatus is UNDEFINED. The Tracking UI would say History and point to the proxy url (which will redirect to the history server). The state should be more descriptive that the job failed and the tracker url shouldn't point to the history server. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Summary: [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true (was: Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Status: Patch Available (was: Open) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Attachment: MAPREDUCE-3904.patch Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true - Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215306#comment-13215306 ] Bikas Saha commented on MAPREDUCE-3353: --- A potential solution would be the following 1) have the scheduler interface return the set of bad nodes on which it has stopped scheduling. This keeps the decision of which node is bad in the scheduler. The scheduler is the ultimate authority on what runs on a node and should tell its clients whether about the nodes that it is not considering for scheduling. 2) 1) above could be done as another interface API or piggybacked on the scheduler.allocate() API. 3) The response could contain all the known bad nodes or deltas to the previous response. Deltas are cheaper to send but are susceptible to message loss and retransmission. Also, deltas would have to be divided into new bad nodes and new good nodes. 4) The AM might want to know the type of bad node. Say lost or unhealthy etc. The bad nodes information could be enhanced via querying the RMNode object for the actual reason/health. As an enhancement, we could add a new RMNodeMananger entity that manages all the RMNodes. The above functionality could move from the scheduler into RMNodeManager (though it would need to be in sync with the scheduler). After that, getting detailed information may not need direct access to RMNode object. Potentially, other interactions with RMNode could be forwarded through the RMNodeManager. But this would be a fairly significant refactoring thats best left to a separate future work item. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3368) compile-mapred-test fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215318#comment-13215318 ] Hudson commented on MAPREDUCE-3368: --- Integrated in Hadoop-Hdfs-0.23-PB-Commit #4 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-PB-Commit/4/]) Revert TestAuditLogger changes from MAPREDUCE-3368. (Revision 1293058) Result = SUCCESS szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293058 Files : * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestAuditLogger.java compile-mapred-test fails - Key: MAPREDUCE-3368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3368 Project: Hadoop Map/Reduce Issue Type: Bug Components: build, mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Hitesh Shah Priority: Critical Fix For: 0.23.1 Attachments: MR-3368.1.patch compile-mapred-test target is failing once again. Details: https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-0.23-Build/83/consoleFull -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215320#comment-13215320 ] Hudson commented on MAPREDUCE-3738: --- Integrated in Hadoop-Common-0.23-Commit #587 (See [https://builds.apache.org/job/Hadoop-Common-0.23-Commit/587/]) merge MAPREDUCE-3738 from trunk (Revision 1293061) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215321#comment-13215321 ] Hadoop QA commented on MAPREDUCE-3904: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12515862/MAPREDUCE-3904.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.mapred.TestIndexCache +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1921//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1921//console This message is automatically generated. [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215325#comment-13215325 ] Hudson commented on MAPREDUCE-3738: --- Integrated in Hadoop-Hdfs-0.23-Commit #574 (See [https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/574/]) merge MAPREDUCE-3738 from trunk (Revision 1293061) Result = SUCCESS sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215328#comment-13215328 ] Bikas Saha commented on MAPREDUCE-3353: --- Not doing deltas on the RM-AM channel does not seem viable because of high frequency message traffic. Sending information about 100 bad nodes at 100 bytes per node for 1000AM's every second is about 10MB/s of traffic. Sending deltas means tracking last and current states on the RM on a per AM attempt basis. That would not be good to do in the scheduler because its not the responsibility of the scheduler. So this needs to be done on each RMAttempt object. The RMAttempt object gets the current list of bad nodes and compares it with its last known list of bad nodes. Additions and deletions are sent to the AM as new bad and good nodes. Alternatively, each RMNode could send an event to each RMAppAttempt for healthy-unhealthy and vice versa transitions. These events could be accumulated and copied to the AM via the allocate response. Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Priority: Critical Fix For: 0.23.2 When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3368) compile-mapred-test fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215330#comment-13215330 ] Hudson commented on MAPREDUCE-3368: --- Integrated in Hadoop-Mapreduce-0.23-PB-Commit #2 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-PB-Commit/2/]) Revert TestAuditLogger changes from MAPREDUCE-3368. (Revision 1293058) Result = ABORTED szetszwo : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293058 Files : * /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestAuditLogger.java compile-mapred-test fails - Key: MAPREDUCE-3368 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3368 Project: Hadoop Map/Reduce Issue Type: Bug Components: build, mrv2 Affects Versions: 0.23.0 Reporter: Ramya Sunil Assignee: Hitesh Shah Priority: Critical Fix For: 0.23.1 Attachments: MR-3368.1.patch compile-mapred-test target is failing once again. Details: https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-0.23-Build/83/consoleFull -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
[ https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215335#comment-13215335 ] Hudson commented on MAPREDUCE-3738: --- Integrated in Hadoop-Mapreduce-0.23-Commit #589 (See [https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/589/]) merge MAPREDUCE-3738 from trunk (Revision 1293061) Result = ABORTED sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061 Files : * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java * /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly Key: MAPREDUCE-3738 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2, nodemanager Affects Versions: 0.23.1, 0.24.0 Reporter: Jason Lowe Assignee: Jason Lowe Priority: Critical Fix For: 0.23.2 Attachments: MAPREDUCE-3738.patch, livehistdump.txt If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception like OutOfMemoryError in the case I saw) then this will lead to a hang during nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during shutdown to make sure log aggregation has completed, and that method internally waits for an atomic boolean to be set by the log aggregation thread to indicate it has finished. Since the thread was killed off earlier due to an uncaught exception, the boolean will never be set and the NM hangs during shutdown repeating something like this every second in the log file: 2012-01-25 22:20:56,366 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: Waiting for aggregation to complete for application_1326848182580_2806 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Status: Open (was: Patch Available) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Attachment: MAPREDUCE-3904.patch [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch, MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true
[ https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated MAPREDUCE-3904: --- Status: Patch Available (was: Open) Resubmitting for intermittent TestIndexCache test failure. [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true --- Key: MAPREDUCE-3904 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Jonathan Eagles Assignee: Jonathan Eagles Attachments: MAPREDUCE-3904.patch, MAPREDUCE-3904.patch Job history page displays 'null'. It looks like job history files only populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history files, getAcls can return null, throwing an exception on the HsJobBlock page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira