[jira] Updated: (MAPREDUCE-732) node health check script should not log UNHEALTHY status for every heartbeat in INFO mode
[ https://issues.apache.org/jira/browse/MAPREDUCE-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreekanth Ramakrishnan updated MAPREDUCE-732: - Release Note: Changed log level of addition of blacklisted reason in the JobTracker log to debug instead of INFO node health check script should not log UNHEALTHY status for every heartbeat in INFO mode --- Key: MAPREDUCE-732 URL: https://issues.apache.org/jira/browse/MAPREDUCE-732 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.21.0 Reporter: Ramya R Assignee: Sreekanth Ramakrishnan Priority: Minor Fix For: 0.21.0 Attachments: MAPRED-732-ydist.patch, mapreduce-732-1.patch, MAPREDUCE-732-2.patch, mapreduce-732.patch Currently, when a TT is blacklisted by the node health check script, for every heartbeat a message such as the following is being logged. {noformat} date time INFO org.apache.hadoop.mapred.JobTracker: Adding blacklisted reason for tracker : blacklisted TT Reason for blacklisting is : NODE_UNHEALTHY {noformat} Due to this, the the JT logs fill up rapidly clogging the logdirs. Hence this message should be logged in DEBUG mode instead of INFO mode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-947) OutputCommitter should have an abortJob method
[ https://issues.apache.org/jira/browse/MAPREDUCE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat updated MAPREDUCE-947: - Release Note: Introduced abortJob() method in OutputCommitter which will be invoked when the job fails or is killed. By default it invokes OutputCommitter.cleanupJob(). Deprecated OutputCommitter.cleanupJob() and introduced OutputCommitter.commitJob() method which will be invoked for successful jobs. Also a _SUCCESS file is created in the output folder for successful jobs. (was: Introduced abortJob() method in OutputCommitter which will be invoked when the job fails or is killed. Also a _done file is created in the output folder for successful jobs while _abort is created for failed/killed jobs.) OutputCommitter should have an abortJob method -- Key: MAPREDUCE-947 URL: https://issues.apache.org/jira/browse/MAPREDUCE-947 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapred-948-v1.12-branch-0.20-internal.patch, mapred-948-v1.12.patch, mapred-948-v1.13-branch-0.20-internal.patch, mapred-948-v1.2.patch, mapred-948-v1.3.patch, mapred-948-v1.4.patch, mapred-948-v1.7.patch, mapred-948-v2.1-branch-0.20.patch, mapred-948-v2.3-branch-0.20.patch, mapred-948-v2.3.patch, mapred-948-v3.1.patch, mapred-948-v3.2.patch, mapred-948-v3.4.patch, mr-947-trunk-new.patch, mr-947-trunk-new.patch, mr-947-trunk.patch, mr-947-trunk.patch, mr-947-trunk.patch, mr-947-y20-new.patch, mr-947-y20.patch, mr-947-y20.patch The OutputCommitter needs an abortJob method to clean up from failed jobs. Currently there is no way to distinguish between failed or succeeded jobs, making it impossible to write output promotion code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-217) Tasks to run on a different jvm version than the TaskTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769911#action_12769911 ] Amar Kamat commented on MAPREDUCE-217: -- Had a discussion with Sharad on this. As he rightly pointed out that giving preference to user defined classpath entries over (tt's) inherited classpath entries can lead to security issues where a malicious user can define its own Task.java or ReduceTask.java. I think we should keep the classpath ordering as is. bq. At least it also needs to set the new classpath for the native libraries and probably there's more that I'm missing. Koji, as of today users can add their libraries which is given preference over the inherited ones. Currently this is what is done child.jvm : tt.jvm child.libraries : user-defined-libraries+tt.libraries child.classpath : tt.classpath+job-jar.classpath+dist-cache-entries+current.wor,dir+user-defined.classpath Changes are child.jvm : user-defined.jvm else tt.jvm // since user is specifying the jvm, the user is responsible for add the the required libs too Tasks to run on a different jvm version than the TaskTracker Key: MAPREDUCE-217 URL: https://issues.apache.org/jira/browse/MAPREDUCE-217 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: linux Reporter: Koji Noguchi Assignee: Amar Kamat Attachments: mapreduce-217-v1.0.patch We use 32-bit jvm for TaskTrackers. Sometimes our users want to call 64-bit JNI libraries from their tasks. This requires tasks to be running on 64-bit jvm. On Solaris, you can simply use -d32/-d64 to choose, but on Linux, it's on a completely different package. So far, tasks run on the same jvm version as the TaskTracker. {noformat} // use same jvm as parent File jvm = new File(new File(System.getProperty(java.home), bin), java); {noformat} Is it possible to let users provide a java home path or let them choose from a pre-selected list of paths? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1098) Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1098: --- Release Note: Fixed the distributed cache's localizeCache to lock only the uri it is localizing. Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks. -- Key: MAPREDUCE-1098 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Reporter: Sreekanth Ramakrishnan Assignee: Amareshwari Sriramadasu Fix For: 0.21.0 Attachments: MAPREDUCE-1098.patch, MAPREDUCE-1098.patch, MAPREDUCE-1098.patch, patch-1098-0.20.txt, patch-1098-1.txt, patch-1098-2.txt, patch-1098-3.txt, patch-1098-4.txt, patch-1098-5.txt, patch-1098-6.txt, patch-1098-7.txt, patch-1098-7.txt, patch-1098-ydist.txt, patch-1098-ydist.txt, patch-1098-ydist.txt, patch-1098.txt Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI, Configuration, Path, FileStatus, boolean, long, Path, boolean)}} allows only one {{TaskRunner}} thread in TT to localize {{DistributedCache}} across jobs. Current way of synchronization is across baseDir this has to be changed to lock on the same baseDir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1048) Show total slot usage in cluster summary on jobtracker webui
[ https://issues.apache.org/jira/browse/MAPREDUCE-1048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1048: --- Release Note: Added occupied map/reduce slots and reserved map/reduce slots to the Cluster Summary table on jobtracker web ui. Show total slot usage in cluster summary on jobtracker webui Key: MAPREDUCE-1048 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1048 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amareshwari Sriramadasu Fix For: 0.22.0 Attachments: mapred-1048-v1.0.patch, mapred-1048-v1.1.patch, MAPREDUCE-1048-20.patch, MAPREDUCE-1048.patch, patch-1048-0.20.txt, patch-1048-1.txt, patch-1048-2.txt, patch-1048-3.txt, patch-1048-4.txt, patch-1048-5.txt, patch-1048-6.txt, patch-1048-ydist.txt, patch-1048.txt With High-Ram jobs coming into the picture, its important to also show the slot usage in cluster summary since total-running-maps total-slots-occupied. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called
[ https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-1152: -- Attachment: 1152.patch Patch fixing fail and kill task metrics. JobTrackerInstrumentation.killed{Map/Reduce} is never called Key: MAPREDUCE-1152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Sharad Agarwal Fix For: 0.22.0 Attachments: 1152.patch, 1152.patch JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of MAPREDUCE-1103 is not captured -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal reassigned MAPREDUCE-1153: - Assignee: Sharad Agarwal Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned. Key: MAPREDUCE-1153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Hemanth Yamijala Assignee: Sharad Agarwal MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of actual, blacklisted and decommissioned tasktrackers. When a tracker is decommissioned, the tasktracker count or the blacklisted tracker count is not decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called
[ https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1152: --- Status: Patch Available (was: Open) Patch looks fine to me. Submitting for hudson JobTrackerInstrumentation.killed{Map/Reduce} is never called Key: MAPREDUCE-1152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Sharad Agarwal Fix For: 0.22.0 Attachments: 1152.patch, 1152.patch JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of MAPREDUCE-1103 is not captured -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-1153: -- Attachment: 1153.patch Moved common code into a single method - removeTracker(TaskTracker) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned. Key: MAPREDUCE-1153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Hemanth Yamijala Assignee: Sharad Agarwal Attachments: 1153.patch MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of actual, blacklisted and decommissioned tasktrackers. When a tracker is decommissioned, the tasktracker count or the blacklisted tracker count is not decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1102) Job gets killed even when the cleanup completes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Kamat reassigned MAPREDUCE-1102: - Assignee: Amar Kamat Job gets killed even when the cleanup completes --- Key: MAPREDUCE-1102 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 When the cleanup completes at the tasktracker and the job is killed by the user, the cleanup runs to completion but the job fails. Ideally if the cleanup is completed then the job should not be killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769979#action_12769979 ] Amar Kamat commented on MAPREDUCE-1102: --- One simple approach would be not to honour 'kill-job' when the cleanup is launched in which case the job can either move to FAILED or SUCCESSFUL state. The job can fail (after cleanup is launched) if all the cleanup attempts fail. The only corner case we need to take care if the case where the FileOutputCommitter.commitJob() creates _SUCCESS and fails. In such a case the job will fail with a _SUCCESS file. Job gets killed even when the cleanup completes --- Key: MAPREDUCE-1102 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 When the cleanup completes at the tasktracker and the job is killed by the user, the cleanup runs to completion but the job fails. Ideally if the cleanup is completed then the job should not be killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-171) TestJobTrackerRestartWithLostTracker sometimes fails while validating history.
[ https://issues.apache.org/jira/browse/MAPREDUCE-171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769984#action_12769984 ] Suman Sehgal commented on MAPREDUCE-171: Yeah, it's failing on 0.20.1 also! TestJobTrackerRestartWithLostTracker sometimes fails while validating history. -- Key: MAPREDUCE-171 URL: https://issues.apache.org/jira/browse/MAPREDUCE-171 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Amareshwari Sriramadasu Attachments: TEST-org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.txt TestJobTrackerRestartWithLostTracker fails with following error Duplicate START_TIME seen for task task_200906151249_0001_m_01 in history file at line 54 junit.framework.AssertionFailedError: Duplicate START_TIME seen for task task_200906151249_0001_m_01 in history file at line 54 at org.apache.hadoop.mapred.TestJobHistory$TestListener.handle(TestJobHistory.java:161) at org.apache.hadoop.mapred.JobHistory.parseLine(JobHistory.java:335) at org.apache.hadoop.mapred.JobHistory.parseHistoryFromFS(JobHistory.java:299) at org.apache.hadoop.mapred.TestJobHistory.validateJobHistoryFileFormat(TestJobHistory.java:478) at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRecoveryWithLostTracker(TestJobTrackerRestartWithLostTracker.java:116) at org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker.testRestartWithLostTracker(TestJobTrackerRestartWithLostTracker.java:162) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1152) JobTrackerInstrumentation.killed{Map/Reduce} is never called
[ https://issues.apache.org/jira/browse/MAPREDUCE-1152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769992#action_12769992 ] Hadoop QA commented on MAPREDUCE-1152: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12423170/1152.patch against trunk revision 829529. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/212/console This message is automatically generated. JobTrackerInstrumentation.killed{Map/Reduce} is never called Key: MAPREDUCE-1152 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1152 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Sharad Agarwal Fix For: 0.22.0 Attachments: 1152.patch, 1152.patch JobTrackerInstrumentation.killed{Map/Reduce} metrics added as part of MAPREDUCE-1103 is not captured -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-947) OutputCommitter should have an abortJob method
[ https://issues.apache.org/jira/browse/MAPREDUCE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Gummadi updated MAPREDUCE-947: --- Attachment: yhadoop20-bug-fix-947.patch Y! 20 patch has a bug that made TestJobHistory to fail. Patch with the fix for the bug for Y! 20 distribution is attached now. Running unit tests with the fix now. OutputCommitter should have an abortJob method -- Key: MAPREDUCE-947 URL: https://issues.apache.org/jira/browse/MAPREDUCE-947 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Owen O'Malley Assignee: Amar Kamat Fix For: 0.22.0 Attachments: mapred-948-v1.12-branch-0.20-internal.patch, mapred-948-v1.12.patch, mapred-948-v1.13-branch-0.20-internal.patch, mapred-948-v1.2.patch, mapred-948-v1.3.patch, mapred-948-v1.4.patch, mapred-948-v1.7.patch, mapred-948-v2.1-branch-0.20.patch, mapred-948-v2.3-branch-0.20.patch, mapred-948-v2.3.patch, mapred-948-v3.1.patch, mapred-948-v3.2.patch, mapred-948-v3.4.patch, mr-947-trunk-new.patch, mr-947-trunk-new.patch, mr-947-trunk.patch, mr-947-trunk.patch, mr-947-trunk.patch, mr-947-y20-new.patch, mr-947-y20.patch, mr-947-y20.patch, yhadoop20-bug-fix-947.patch The OutputCommitter needs an abortJob method to clean up from failed jobs. Currently there is no way to distinguish between failed or succeeded jobs, making it impossible to write output promotion code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770071#action_12770071 ] Owen O'Malley commented on MAPREDUCE-1102: -- Rather than blocking kill-job, I think we are better off guaranteeing that if the job fails, we will always call abortJob. Even if commitJob has started (or finished). We should also make the FileOutputFormat abortJob delete _SUCCESS to handle this case. This would also handle the case where the job commit task fails part way through. I agree that all output committers may not be able to unroll their commit. However, I think that we need to give them the ability to do the right thing. Job gets killed even when the cleanup completes --- Key: MAPREDUCE-1102 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 When the cleanup completes at the tasktracker and the job is killed by the user, the cleanup runs to completion but the job fails. Ideally if the cleanup is completed then the job should not be killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1102) Job gets killed even when the cleanup completes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770079#action_12770079 ] Owen O'Malley commented on MAPREDUCE-1102: -- Naturally, the fact that it may be invoked after the commitJob method has been called should be called out in the JavaDoc for the abortJob method. Job gets killed even when the cleanup completes --- Key: MAPREDUCE-1102 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1102 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Amar Kamat Assignee: Amar Kamat Fix For: 0.22.0 When the cleanup completes at the tasktracker and the job is killed by the user, the cleanup runs to completion but the job fails. Ideally if the cleanup is completed then the job should not be killed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars
[ https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770121#action_12770121 ] Todd Lipcon commented on MAPREDUCE-967: --- One note about this JIRA - it will need some fix for Streaming as well. The common way that people ship scripts for streaming is using the -file foo.py argument. This just includes foo.py in the job jar and assumes it will be unpacked on the other side. With this patch, it won't unpack those and breaks the -file argument's primary use case. Two options to fix this issue: # We could change -file to use DistributedCache instead. The fact that -file and -files do different things is confusing in the first place, but changing the behavior is potentially breaking change, I think. # We could change Streaming to add all of the -file paths to the new configuration parameter such that the existing behavior is preserved. If no one else has a preference I'll go for option #2 above. TaskTracker does not need to fully unjar job jars - Key: MAPREDUCE-967 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.21.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-967-branch-0.20.txt In practice we have seen some users submitting job jars that consist of 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning up after them has a significant cost (both in wall clock and in unnecessary heavy disk utilization). This cost can be easily avoided -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1155) Streaming TestMultipleArchiveFiles swallows exceptions
Streaming TestMultipleArchiveFiles swallows exceptions -- Key: MAPREDUCE-1155 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1155 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/streaming Affects Versions: 0.20.1, 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor TestMultipleArchiveFiles catches exceptions and prints their stack trace rather than failing the job. This means that tests do not fail even when the job fails. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching
[ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1114: --- Status: Patch Available (was: Open) Speed up ivy resolution in builds with clever caching - Key: MAPREDUCE-1114 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-1114.txt, mapreduce-1114.txt An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly ant test) significantly using some ant macros to cache the resolved classpaths. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching
[ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1114: --- Attachment: mapreduce-1114.txt Attaching up to date patch. Speed up ivy resolution in builds with clever caching - Key: MAPREDUCE-1114 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-1114.txt, mapreduce-1114.txt An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly ant test) significantly using some ant macros to cache the resolved classpaths. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching
[ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1114: --- Attachment: mapreduce-1114.txt Forgot to include build-macros.xml in previous patch Speed up ivy resolution in builds with clever caching - Key: MAPREDUCE-1114 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly ant test) significantly using some ant macros to cache the resolved classpaths. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching
[ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1114: --- Status: Patch Available (was: Open) Speed up ivy resolution in builds with clever caching - Key: MAPREDUCE-1114 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly ant test) significantly using some ant macros to cache the resolved classpaths. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1114) Speed up ivy resolution in builds with clever caching
[ https://issues.apache.org/jira/browse/MAPREDUCE-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-1114: --- Status: Open (was: Patch Available) Speed up ivy resolution in builds with clever caching - Key: MAPREDUCE-1114 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1114 Project: Hadoop Map/Reduce Issue Type: Improvement Components: build Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Minor Attachments: mapreduce-1114.txt, mapreduce-1114.txt, mapreduce-1114.txt An awful lot of time is spent in the ivy:resolve parts of the build, even when all of the dependencies have been fetched and cached. Profiling showed this was in XML parsing. I have a sort-of-ugly hack which speeds up incremental compiles (and more importantly ant test) significantly using some ant macros to cache the resolved classpaths. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1156) Caching localized counter names in mapred.Counters
Caching localized counter names in mapred.Counters -- Key: MAPREDUCE-1156 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1156 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Hong Tang Using YourKit profiling mumak, we found that MissingResourceException was thrown and caught 1.6 million times in Counters.Group.localize for several hundred of jobs. The resource bundle look up and costly exception processing can be easily avoided if we have a global cache of localized counter names. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1103) Additional JobTracker metrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-1103: -- Release Note: Add following additional job tracker metrics: Reserved{Map, Reduce}Slots Occupied{Map, Reduce}Slots Running{Map, Reduce}Tasks Killed{Map, Reduce}Tasks FailedJobs KilledJobs PrepJobs RunningJobs TotalTrackers BlacklistedTrackers DecommissionedTrackers Additional JobTracker metrics - Key: MAPREDUCE-1103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1103 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.21.0 Reporter: Arun C Murthy Assignee: Sharad Agarwal Fix For: 0.22.0 Attachments: 1103.patch, 1103.patch, 1103_v1.patch, 1103_v2.patch, 1103_v3.patch, 1103_v4.patch, 1103_v5.patch, 1103_v5_yahoo_1.patch It would be useful for tracking the following additional JobTracker metrics: running{map|reduce}tasks busy{map|reduce}slots reserved{map|reduce}slots -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1153) Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned.
[ https://issues.apache.org/jira/browse/MAPREDUCE-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1153: --- Status: Patch Available (was: Open) changes look fine. Submitting for hudson Metrics counting tasktrackers and blacklisted tasktrackers are not updated when trackers are decommissioned. Key: MAPREDUCE-1153 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1153 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.22.0 Reporter: Hemanth Yamijala Assignee: Sharad Agarwal Attachments: 1153.patch MAPREDUCE-1103 added instrumentation on the jobtracker to count the number of actual, blacklisted and decommissioned tasktrackers. When a tracker is decommissioned, the tasktracker count or the blacklisted tracker count is not decremented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1144) JT should not hold lock while writing user history logs to DFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12770383#action_12770383 ] Sharad Agarwal commented on MAPREDUCE-1144: --- Since MAPREDUCE-814 adds the capability to have job logs in HDFS, there is not much utility in enabling the user logs. Users can directly access those from HDFS done folder location. Infact in 0.21, user log has been removed as part of job history format/API refactoring - MAPREDUCE-157 JT should not hold lock while writing user history logs to DFS -- Key: MAPREDUCE-1144 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1144 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.1 Reporter: Todd Lipcon I've seen behavior a few times now where the DFS is being slow for one reason or another, and the JT essentially locks up waiting on it while one thread tries for a long time to write history files out. The stack trace blocking everything is: Thread 210 (IPC Server handler 10 on 7277): State: WAITING Blocked count: 171424 Waited count: 1209604 Waiting on java.util.linkedl...@407dd154 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3122) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3202) org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3151) org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:67) org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:301) sun.nio.cs.StreamEncoder.close(StreamEncoder.java:130) java.io.OutputStreamWriter.close(OutputStreamWriter.java:216) java.io.BufferedWriter.close(BufferedWriter.java:248) java.io.PrintWriter.close(PrintWriter.java:295) org.apache.hadoop.mapred.JobHistory$JobInfo.logFinished(JobHistory.java:1349) org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2167) org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2111) org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:873) org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3598) org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2792) org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2581) sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) We should try not to do external IO while holding the JT lock, and instead write the data to an in-memory buffer, drop the lock, and then write. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.