[jira] [Commented] (MAPREDUCE-2413) TaskTracker should handle disk failures at both startup and runtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13066837#comment-13066837 ] Ravi Gummadi commented on MAPREDUCE-2413: - Am working on porting this patch to trunk. TaskTracker should handle disk failures at both startup and runtime --- Key: MAPREDUCE-2413 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2413 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task-controller, tasktracker Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Ravi Gummadi Fix For: 0.20.204.0 Attachments: MR-2413.v0.1.patch, MR-2413.v0.2.patch, MR-2413.v0.3.patch, MR-2413.v0.patch At present, TaskTracker doesn't handle disk failures properly both at startup and runtime. (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs. (2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn't do anything special. This results in either (a) TaskTracker continues to try to use that bad disk and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2701: --- Status: Patch Available (was: Open) MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2701: --- Fix Version/s: 0.23.0 MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2701: --- Attachment: MR-2701-v1.patch This patch adds in UGI information to Job for the user that launched the job. This is in preparation for the GUI to display this information. MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067080#comment-13067080 ] Hadoop QA commented on MAPREDUCE-2701: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486879/MR-2701-v1.patch against trunk revision 1146517. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/476//console This message is automatically generated. MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Naisbitt updated MAPREDUCE-2489: Status: Patch Available (was: Open) Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable
[ https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Naisbitt updated MAPREDUCE-2489: Status: Open (was: Patch Available) Resubmitting patch to run through hudson Jobsplits with random hostnames can make the queue unusable --- Key: MAPREDUCE-2489 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobtracker Affects Versions: 0.20.205.0, 0.23.0 Reporter: Jeffrey Naisbitt Assignee: Jeffrey Naisbitt Attachments: MAPREDUCE-2489-0.20s-v2.patch, MAPREDUCE-2489-0.20s-v3.patch, MAPREDUCE-2489-0.20s.patch, MAPREDUCE-2489-mapred-v2.patch, MAPREDUCE-2489-mapred-v3.patch, MAPREDUCE-2489-mapred-v4.patch, MAPREDUCE-2489-mapred.patch We saw an issue where a custom InputSplit was returning invalid hostnames for the splits that were then causing the JobTracker to attempt to excessively resolve host names. This caused a major slowdown for the JobTracker. We should prevent invalid InputSplit hostnames from affecting everyone else. I propose we implement some verification for the hostnames to try to ensure that we only do DNS lookups on valid hostnames (and fail otherwise). We could also fail the job after a certain number of failures in the resolve. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067103#comment-13067103 ] Robert Joseph Evans commented on MAPREDUCE-2701: This patch is intended for the MR-279 branch not trunk. MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2413) TaskTracker should handle disk failures at both startup and runtime
[ https://issues.apache.org/jira/browse/MAPREDUCE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067119#comment-13067119 ] Eli Collins commented on MAPREDUCE-2413: @Ravi - trunk's task tracker or as a feature for MR2? TaskTracker should handle disk failures at both startup and runtime --- Key: MAPREDUCE-2413 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2413 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task-controller, tasktracker Affects Versions: 0.20.204.0 Reporter: Bharath Mundlapudi Assignee: Ravi Gummadi Fix For: 0.20.204.0 Attachments: MR-2413.v0.1.patch, MR-2413.v0.2.patch, MR-2413.v0.3.patch, MR-2413.v0.patch At present, TaskTracker doesn't handle disk failures properly both at startup and runtime. (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is on a bad disk. TaskTracker should ignore that particular mapred-local-dir and start up and use only the remaining good mapred-local-dirs. (2) If a disk goes bad while TaskTracker is running, currently TaskTracker doesn't do anything special. This results in either (a) TaskTracker continues to try to use that bad disk and this results in lots of task failures and possibly job failures(because of multiple TTs having bad disks) and eventually these TTs getting graylisted for all jobs. And this needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. OR (b) Health check script identifying the disk as bad and the TT gets blacklisted. And this also needs manual restart of TT with modified configuration of mapred-local-dirs avoiding the bad disk. This JIRA is to make TaskTracker more fault-tolerant to disk failures solving (1) and (2). i.e. TT should start even if at least one of the mapred-local-dirs is on a good disk and TT should adjust its in-memory list of mapred-local-dirs and avoid using bad mapred-local-dirs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2623) Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder
[ https://issues.apache.org/jira/browse/MAPREDUCE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-2623: --- Issue Type: Improvement (was: Task) Hadoop Flags: [Reviewed] +1 looks good Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder - Key: MAPREDUCE-2623 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2623 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.0 Reporter: Jim Plush Assignee: Harsh J Priority: Minor Fix For: 0.23.0 Attachments: MAPREDUCE-2623.r1.diff, MAPREDUCE-2623.r2.diff Looking at test class ClusterMapReduceTestCase it issues a warning that the dfsCluster = new MiniDFSCluster(conf, 2, reformatDFS, null); line of code is deprecated and MiniDFSCluster.Builder should be used instead. It notes that the current API will be phased out in version 24. I propose to update the test class to the most up to date code as it's referenced several places on the internet as an example of how to write a Hadoop Unit Test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2623) Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder
[ https://issues.apache.org/jira/browse/MAPREDUCE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated MAPREDUCE-2623: --- Resolution: Fixed Status: Resolved (was: Patch Available) I've committed this. Thanks Harsh! Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder - Key: MAPREDUCE-2623 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2623 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.0 Reporter: Jim Plush Assignee: Harsh J Priority: Minor Fix For: 0.23.0 Attachments: MAPREDUCE-2623.r1.diff, MAPREDUCE-2623.r2.diff Looking at test class ClusterMapReduceTestCase it issues a warning that the dfsCluster = new MiniDFSCluster(conf, 2, reformatDFS, null); line of code is deprecated and MiniDFSCluster.Builder should be used instead. It notes that the current API will be phased out in version 24. I propose to update the test class to the most up to date code as it's referenced several places on the internet as an example of how to write a Hadoop Unit Test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2669) Some new examples and test cases for them.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067168#comment-13067168 ] Plamen Jeliazkov commented on MAPREDUCE-2669: - Thank you, Devaraj! Yes I have been filing it on the review board; I have been uploading the .patchs here as well as on the review board. I will add your comments to the patch and upload again soon. Some new examples and test cases for them. -- Key: MAPREDUCE-2669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2669 Project: Hadoop Map/Reduce Issue Type: Test Components: examples Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Minor Attachments: MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, mapreduce-new-examples-0.22.patch Original Estimate: 48h Remaining Estimate: 48h Looking to add some more examples such as Mean, Median, and Standard Deviation to the examples. I have some generic JUnit testcases as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2623) Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder
[ https://issues.apache.org/jira/browse/MAPREDUCE-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067173#comment-13067173 ] Hudson commented on MAPREDUCE-2623: --- Integrated in Hadoop-Mapreduce-trunk-Commit #747 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/747/]) MAPREDUCE-2623. Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder. Contributed by Harsh J Chouraria eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1147981 Files : * /hadoop/common/trunk/mapreduce/CHANGES.txt * /hadoop/common/trunk/mapreduce/src/test/mapred/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java Update ClusterMapReduceTestCase to use MiniDFSCluster.Builder - Key: MAPREDUCE-2623 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2623 Project: Hadoop Map/Reduce Issue Type: Improvement Components: test Affects Versions: 0.23.0 Reporter: Jim Plush Assignee: Harsh J Priority: Minor Fix For: 0.23.0 Attachments: MAPREDUCE-2623.r1.diff, MAPREDUCE-2623.r2.diff Looking at test class ClusterMapReduceTestCase it issues a warning that the dfsCluster = new MiniDFSCluster(conf, 2, reformatDFS, null); line of code is deprecated and MiniDFSCluster.Builder should be used instead. It notes that the current API will be phased out in version 24. I propose to update the test class to the most up to date code as it's referenced several places on the internet as an example of how to write a Hadoop Unit Test. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2701) MR-279: app/Job.java needs UGI for the user that launched it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067182#comment-13067182 ] Robert Joseph Evans commented on MAPREDUCE-2701: I am requesting that someone please review this patch. MR-279: app/Job.java needs UGI for the user that launched it Key: MAPREDUCE-2701 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2701 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2701-v1.patch ./mr-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/Job.java is missing some data that is needed by the Job History GUI. It needs the UGI for the user that launched it. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2652) MR-279: Cannot run multiple NMs on a single node
[ https://issues.apache.org/jira/browse/MAPREDUCE-2652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067183#comment-13067183 ] Robert Joseph Evans commented on MAPREDUCE-2652: I am requesting that someone please review this patch Thanks. MR-279: Cannot run multiple NMs on a single node - Key: MAPREDUCE-2652 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2652 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MR-2652-v1.txt, MR-2652-v2.txt Currently in MR-279 the Auxiliary services, like ShuffleHandler, have no way to communicate information back to the applications. Because of this the Map Reduce Application Master has hardcoded in a port of 8080 for shuffle. This prevents the configuration mapreduce.shuffle.port form ever being set to anything but 8080. The code should be updated to allow this information to be returned to the application master. Also the data needs to be persisted to the task log so that on restart the data is not lost. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
[ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067184#comment-13067184 ] Robert Joseph Evans commented on MAPREDUCE-2494: I am requesting that someone please review the patch for the 0.20 security line. The changes are almost identical to what went into trunk. Thanks Make the distributed cache delete entires using LRU priority Key: MAPREDUCE-2494 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494 Project: Hadoop Map/Reduce Issue Type: Improvement Components: distributed-cache Affects Versions: 0.20.205.0, 0.21.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans Fix For: 0.23.0 Attachments: MAPREDUCE-2494-20.20X-V1.patch, MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch Currently the distributed cache will wait until a cache directory is above a preconfigured threshold. At which point it will delete all entries that are not currently being used. It seems like we would get far fewer cache misses if we kept some of them around, even when they are not being used. We should add in a configurable percentage for a goal of how much of the cache should remain clear when not in use, and select objects to delete based off of how recently they were used, and possibly also how large they are/how difficult is it to download them again. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067193#comment-13067193 ] Robert Joseph Evans commented on MAPREDUCE-2324: I uploaded a patch a while ago and the conversation has kind of died off. Can someone please review the patch and give me some feedback on it. If it is something that you don't want to put into a sustaining release at this time then please give me some feedback possibly with a -1, depending on how adamant you are about it, so I can address those issues perhaps by fixing it just in 0.23 instead. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2705) tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed
tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed - Key: MAPREDUCE-2705 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2705 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Thomas Graves Assignee: Thomas Graves The current TaskLauncher serially launches new tasks one at a time. During the launch it does the localization and then starts the map/reduce task. This can cause any other tasks to be blocked waiting for the current task to be localized and started. In some instances we have seen a task that has a large file to localize (1.2MB) block another task for about 40 minutes. This particular task being blocked was a cleanup task which caused the job to be delayed finishing for the 40 minutes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2658) Problem running full map reduce jobs on mrv2
[ https://issues.apache.org/jira/browse/MAPREDUCE-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067236#comment-13067236 ] Ahmed Radwan commented on MAPREDUCE-2658: - Thanks Arun, I'll take a look. I think this will require considering the MAPREDUCE-2400 recent changes. Any other issues I should also consider? Problem running full map reduce jobs on mrv2 -- Key: MAPREDUCE-2658 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2658 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Ahmed Radwan Assignee: Ahmed Radwan Attachments: MAPREDUCE-2658.patch Following the installation instructions at: https://svn.apache.org/repos/asf/hadoop/common/branches/MR-279/mapreduce/INSTALL the randomwriter example runs successfully. However, other full map reduce jobs (e.g. wordcount) fail with the error: java.lang.UnsupportedOperationException: Incompatible with LocalRunner at org.apache.hadoop.mapred.YarnOutputFiles.getInputFile(YarnOutputFiles.java:200) at org.apache.hadoop.mapred.ReduceTask.getMapFiles(ReduceTask.java:223) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1094) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) The ReduceTask evaluates the isLocal flag based on the property mapreduce.jobtracker.address, the default value for this property in mapred-default.xml is 'local' and this is the cause of the problem. Setting mapreduce.jobtracker.address in the mapred-site.xml to something other than local seems to solve the problem. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067240#comment-13067240 ] Todd Lipcon commented on MAPREDUCE-2324: Hey Bobby. Sorry, was on vacation last week so only partially keeping up with JIRA traffic. My worry mostly has to do with this feature being kicked in as a false positive. In general, false positives here are very expensive, whereas false negatives are not nearly as drastic. For example, imagine a cluster with 10 nodes and a couple of jobs submitted. One of the nodes is out of disk space. The first job, when submitted, takes up all the reduce slots on the first 9 nodes, but the 10th node is left empty since it's out of space. When the second job is submitted, all of the free reduce slots on the cluster are located on this remaining node. Every time the node heartbeats, the counter will get incremented for the queued up job. After 10 heartbeats, the job will fail, even though it was just a single problematic node. So, I think we do need to wait for a scheduling opportunity on at least some number of unique nodes before failing the job. It seems we could do this with a single HashSet per job - whenever any reduce task is successfully scheduld, the set is cleared. Whenever a job is given an opportunity to schedule reduces on a node, but can't due to resource constraints, it's added to the set. Once the size of the set eclipses some percentage of the nodes on the cluster, it fails the job. This memory usage would be O(nodes*jobs) rather than O(nodes*tasks) -- and thus not too bad. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2705) tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067250#comment-13067250 ] Thomas Graves commented on MAPREDUCE-2705: -- Note 1.2MB should be 1.2GB. tasks localized and launched serially by TaskLauncher - causing other tasks to be delayed - Key: MAPREDUCE-2705 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2705 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Reporter: Thomas Graves Assignee: Thomas Graves The current TaskLauncher serially launches new tasks one at a time. During the launch it does the localization and then starts the map/reduce task. This can cause any other tasks to be blocked waiting for the current task to be localized and started. In some instances we have seen a task that has a large file to localize (1.2MB) block another task for about 40 minutes. This particular task being blocked was a cleanup task which caused the job to be delayed finishing for the 40 minutes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067264#comment-13067264 ] Robert Joseph Evans commented on MAPREDUCE-2324: That is a very good point and I really like the solution. I will incorporate your comments and upload a new patch. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
[ https://issues.apache.org/jira/browse/MAPREDUCE-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Naisbitt updated MAPREDUCE-2706: Attachment: MAPREDUCE-2706.patch MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Attachments: MAPREDUCE-2706.patch Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
[ https://issues.apache.org/jira/browse/MAPREDUCE-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Naisbitt updated MAPREDUCE-2706: Status: Patch Available (was: Open) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Attachments: MAPREDUCE-2706.patch Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
[ https://issues.apache.org/jira/browse/MAPREDUCE-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeffrey Naisbitt updated MAPREDUCE-2706: Attachment: MAPREDUCE-2706.patch MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Attachments: MAPREDUCE-2706.patch Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
[ https://issues.apache.org/jira/browse/MAPREDUCE-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067275#comment-13067275 ] Hadoop QA commented on MAPREDUCE-2706: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486917/MAPREDUCE-2706.patch against trunk revision 1147981. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/477//console This message is automatically generated. MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Attachments: MAPREDUCE-2706.patch Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2706) MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged
[ https://issues.apache.org/jira/browse/MAPREDUCE-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067276#comment-13067276 ] Jeffrey Naisbitt commented on MAPREDUCE-2706: - This patch is for the MR-279 branch, so the above test-patch results are not applicable MR-279: Submit jobs beyond the max jobs per queue limit no longer gets logged - Key: MAPREDUCE-2706 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2706 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.0 Reporter: Jeffrey Naisbitt Attachments: MAPREDUCE-2706.patch Submitting jobs over the queue limits used to print log messages such as these: hadoop-mapred-jobtracker-HOSTNAME.log. ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default has 10 active tasks for user MYUSER, cannot initialize job_XXX with 10 tasks since it will exceed limit of 15 active tasks per user for this queue and hadoop-mapred-jobtracker-HOSTNAME.log ... INFO org.apache.hadoop.mapred.CapacityTaskScheduler: default already has 2 running jobs and 0 initializing jobs; cannot initialize job_XXX since it will exceeed limit of 2 initialized jobs for this queue These log messages are useful - especially for QA and testing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2638) Create a simple stress test for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067291#comment-13067291 ] Tom White commented on MAPREDUCE-2638: -- Thanks Matei. The preemption intervals are indeed very low - they are set like this in order to trigger preemption in a pseudo-distributed cluster and so stress the scheduler. For larger clusters the settings you suggest are entirely appropriate, as well as increasing the sleep time in the jobs by setting {{test.fairscheduler.sleepTime}} to a higher value. Create a simple stress test for the fair scheduler -- Key: MAPREDUCE-2638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2638 Project: Hadoop Map/Reduce Issue Type: Test Components: contrib/fair-share Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-2638.patch, MAPREDUCE-2638.patch This would be a test that runs against a cluster, typically with settings that allow preemption to be exercised. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2324: --- Status: Open (was: Patch Available) Uploading new patch. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2324: --- Attachment: MR-2324-security-v2.txt [exec] [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. I also did some math. On our largest cluster here at Yahoo! we have 5000 machines and at most about 200 jobs running concurrently. That comes out to about 8-16 MB in extra heap usage on the JT, if the HashMap is half full and all of those 200 jobs are about to fail because of reduce scheduling issues. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt, MR-2324-security-v2.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Joseph Evans updated MAPREDUCE-2324: --- Status: Patch Available (was: Open) Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt, MR-2324-security-v2.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067297#comment-13067297 ] Hadoop QA commented on MAPREDUCE-2324: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486922/MR-2324-security-v2.txt against trunk revision 1147981. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/478//console This message is automatically generated. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt, MR-2324-security-v2.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2638) Create a simple stress test for the fair scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-2638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067311#comment-13067311 ] Matei Zaharia commented on MAPREDUCE-2638: -- OK, that makes sense. +1 to commit this then. Create a simple stress test for the fair scheduler -- Key: MAPREDUCE-2638 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2638 Project: Hadoop Map/Reduce Issue Type: Test Components: contrib/fair-share Reporter: Tom White Assignee: Tom White Attachments: MAPREDUCE-2638.patch, MAPREDUCE-2638.patch This would be a test that runs against a cluster, typically with settings that allow preemption to be exercised. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2707) ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc
ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc -- Key: MAPREDUCE-2707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2707 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey ProtoOverHadoopRpcEngine is introduced in MR-279, which uses TunnelProtocol over WritableRpcEngine. This jira removes the tunnel protocol and lets ProtoOverHadoopRpcEngine directly interact with ipc.Client and ipc.Server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2707) ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc
[ https://issues.apache.org/jira/browse/MAPREDUCE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067319#comment-13067319 ] Jitendra Nath Pandey commented on MAPREDUCE-2707: - This jira doesn't intend to remove writable from ipc.Client/Server. That is proposed in a different jira (HADOOP-7399). This will just remove TunnelProtocol but the protocol buffer messages will still be wrapped in a generic Writable and passed to ipc Client. When HADOOP-7399 is ready to go, ProtoOverHadoopRpcEngine will be modified not to wrap request/response into Writable. ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc -- Key: MAPREDUCE-2707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2707 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey ProtoOverHadoopRpcEngine is introduced in MR-279, which uses TunnelProtocol over WritableRpcEngine. This jira removes the tunnel protocol and lets ProtoOverHadoopRpcEngine directly interact with ipc.Client and ipc.Server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2324) Job should fail if a reduce task can't be scheduled anywhere
[ https://issues.apache.org/jira/browse/MAPREDUCE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067336#comment-13067336 ] Todd Lipcon commented on MAPREDUCE-2324: Not sure if I'll have time to review this in the next couple days. Anyone over there who could review for you? Otherwise I'll try to look by the end of the week. Job should fail if a reduce task can't be scheduled anywhere Key: MAPREDUCE-2324 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2324 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Todd Lipcon Assignee: Robert Joseph Evans Attachments: MR-2324-security-v1.txt, MR-2324-security-v2.txt If there's a reduce task that needs more disk space than is available on any mapred.local.dir in the cluster, that task will stay pending forever. For example, we produced this in a QA cluster by accidentally running terasort with one reducer - since no mapred.local.dir had 1T free, the job remained in pending state for several days. The reason for the stuck task wasn't clear from a user perspective until we looked at the JT logs. Probably better to just fail the job if a reduce task goes through all TTs and finds that there isn't enough space. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2650) back-port MAPREDUCE-2238 to 0.20-security
[ https://issues.apache.org/jira/browse/MAPREDUCE-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067368#comment-13067368 ] Sherry Chen commented on MAPREDUCE-2650: Todd, I did not make it clear in previous comment. Throws an exception (when makedirs failed) semantics are used in trunk and CDH3. It's good to put it in 0.20-security. back-port MAPREDUCE-2238 to 0.20-security - Key: MAPREDUCE-2650 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2650 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.205.0 Reporter: Sherry Chen Assignee: Sherry Chen Attachments: MAPREDUCE-2650.patch Dev had seen the attempt directory permission getting set to 000 or 111 in the CI builds and tests run on dev desktops with 0.20-security. MAPREDUCE-2238 reported and fixed the issue for 0.22.0, back-port to 0.20-security is needed. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2669) Some new examples and test cases for them.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated MAPREDUCE-2669: Attachment: MAPREDUCE-2669.patch The reason for the 5MB patch is that it includes a sample text file for the JUnit tests to use. I have done applied the patch myself and it appears to be working correctly. I don't know why the core tests are failing, or the contrib tests, but after looking them over twice now I am pretty sure I can conclude that they were present prior to my patch. In any case, here is the latest patch! Some new examples and test cases for them. -- Key: MAPREDUCE-2669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2669 Project: Hadoop Map/Reduce Issue Type: Test Components: examples Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Minor Attachments: MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, mapreduce-new-examples-0.22.patch Original Estimate: 48h Remaining Estimate: 48h Looking to add some more examples such as Mean, Median, and Standard Deviation to the examples. I have some generic JUnit testcases as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated MAPREDUCE-2627: Status: Patch Available (was: Open) guava-r09 JAR file needs to be added to mapreduce. -- Key: MAPREDUCE-2627 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Blocker Attachments: patch.txt Original Estimate: 24h Remaining Estimate: 24h Need to add the guava-r09.jar file into the mapreduce/build/ivy/lib/Hadoop/common directory; missing from build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated MAPREDUCE-2627: Fix Version/s: 0.22.0 guava-r09 JAR file needs to be added to mapreduce. -- Key: MAPREDUCE-2627 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: patch.txt Original Estimate: 24h Remaining Estimate: 24h Need to add the guava-r09.jar file into the mapreduce/build/ivy/lib/Hadoop/common directory; missing from build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067402#comment-13067402 ] Hadoop QA commented on MAPREDUCE-2627: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12485346/patch.txt against trunk revision 1147981. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/480//console This message is automatically generated. guava-r09 JAR file needs to be added to mapreduce. -- Key: MAPREDUCE-2627 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: patch.txt Original Estimate: 24h Remaining Estimate: 24h Need to add the guava-r09.jar file into the mapreduce/build/ivy/lib/Hadoop/common directory; missing from build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2627) guava-r09 JAR file needs to be added to mapreduce.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067406#comment-13067406 ] Plamen Jeliazkov commented on MAPREDUCE-2627: - QA bot ran on trunk revision -- patch was intended for branch 0.22.0 guava-r09 JAR file needs to be added to mapreduce. -- Key: MAPREDUCE-2627 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2627 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Blocker Fix For: 0.22.0 Attachments: patch.txt Original Estimate: 24h Remaining Estimate: 24h Need to add the guava-r09.jar file into the mapreduce/build/ivy/lib/Hadoop/common directory; missing from build. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2707) ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc
[ https://issues.apache.org/jira/browse/MAPREDUCE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated MAPREDUCE-2707: Attachment: MAPREDUCE-2707.2.patch ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc -- Key: MAPREDUCE-2707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2707 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: MAPREDUCE-2707.2.patch ProtoOverHadoopRpcEngine is introduced in MR-279, which uses TunnelProtocol over WritableRpcEngine. This jira removes the tunnel protocol and lets ProtoOverHadoopRpcEngine directly interact with ipc.Client and ipc.Server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2707) ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc
[ https://issues.apache.org/jira/browse/MAPREDUCE-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067410#comment-13067410 ] Jitendra Nath Pandey commented on MAPREDUCE-2707: - The patch uploaded is for MR-279 branch only. ProtoOverHadoopRpcEngine without using TunnelProtocol over WritableRpc -- Key: MAPREDUCE-2707 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2707 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: MAPREDUCE-2707.2.patch ProtoOverHadoopRpcEngine is introduced in MR-279, which uses TunnelProtocol over WritableRpcEngine. This jira removes the tunnel protocol and lets ProtoOverHadoopRpcEngine directly interact with ipc.Client and ipc.Server. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2669) Some new examples and test cases for them.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067451#comment-13067451 ] Hadoop QA commented on MAPREDUCE-2669: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12486940/MAPREDUCE-2669.patch against trunk revision 1147981. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 3 release audit warnings (more than the trunk's current 2 warnings). -1 core tests. The patch failed these core unit tests: org.apache.hadoop.cli.TestMRCLI org.apache.hadoop.fs.TestFileSystem -1 contrib tests. The patch failed contrib unit tests. +1 system test framework. The patch passed system test framework compile. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/479//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/479//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/479//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/479//console This message is automatically generated. Some new examples and test cases for them. -- Key: MAPREDUCE-2669 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2669 Project: Hadoop Map/Reduce Issue Type: Test Components: examples Affects Versions: 0.22.0 Reporter: Plamen Jeliazkov Priority: Minor Attachments: MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, MAPREDUCE-2669.patch, mapreduce-new-examples-0.22.patch Original Estimate: 48h Remaining Estimate: 48h Looking to add some more examples such as Mean, Median, and Standard Deviation to the examples. I have some generic JUnit testcases as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2339) optimize JobInProgress.getTaskInProgress(taskid)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067452#comment-13067452 ] Liyin Liang commented on MAPREDUCE-2339: Nice patch! A user submitted a job with more than 680,000 map tasks to our cluster. Then jobtracker become inefficient to process heartbeats, many threads are blocked and lots of requests are queued. Through jstack of JobTracker process, we find most of the time are spent on JIP.getTaskInProgress(). This patch is a good way to improve JIP.getTaskInProgress()'s performance and fix our problem. optimize JobInProgress.getTaskInProgress(taskid) Key: MAPREDUCE-2339 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2339 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobtracker Affects Versions: 0.20.2, 0.21.0 Reporter: Kang Xiao Attachments: MAPREDUCE-2339.patch, MAPREDUCE-2339.patch JobInProgress.getTaskInProgress(taskid) use a linner search to get the TaskInProgress object by taskid. In fact, it can be replaced by much more efficient array index operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (MAPREDUCE-2694) AM releases too many containers due to the protocol
[ https://issues.apache.org/jira/browse/MAPREDUCE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli reopened MAPREDUCE-2694: Assignee: (was: Arun C Murthy) Reopening the issue as the discussion is still happening. AM releases too many containers due to the protocol --- Key: MAPREDUCE-2694 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2694 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Arun C Murthy - AM sends request asking 4 containers on host H1. - Asynchronously, host H1 reaches RM and gets assigned 4 containers. RM at this point, sets the value against H1 to zero in its aggregate request-table for all apps. - In the mean-while AM gets to need 3 more containers, so a total of 7 including the 4 from previous request. - Today, AM sends the absolute number of 7 against H1 to RM as part of its request table. - RM seems to be overriding its earlier value of zero against H1 to 7 against H1. And thus allocating 7 more containers. - AM already gets 4 in this scheduling iteration, but gets 7 more, a total of 11 instead of the required 7. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2589) TaskTracker not purging userlog directories
[ https://issues.apache.org/jira/browse/MAPREDUCE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated MAPREDUCE-2589: - Fix Version/s: 0.20.205.0 The patch looks good. One minor nit: I think the variable name below: {quote} long logRetainiMillSec = DEFAULT_USER_LOG_RETAIN_MAX_HOURS * 60 * 60 * 1000; {quote} was supposed to be logRetainMilliSec? (spelling mistake?) Also, can you please post the ant test results on the jira? THe patch lacks unit tests, have you already verified the fix on a small cluster? TaskTracker not purging userlog directories --- Key: MAPREDUCE-2589 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2589 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Environment: 0.20.205 Reporter: Sherry Chen Assignee: Sherry Chen Priority: Minor Fix For: 0.20.205.0 Attachments: MAPREDUCE-2589.patch, cleanup_userlogs.py UserLogCleaner is not robust. Leftover userlogs after a restart sometimes have to be manually cleaned. Things can accumulate over a period of time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2691) Finish up the cleanup of distributed cache file resources and related tests.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated MAPREDUCE-2691: --- Summary: Finish up the cleanup of distributed cache file resources and related tests. (was: Implement cleanup of distributed cache file resources) bq. Vinod, I think you had a patch right? Nope, not me. But [~chris.douglas] already pushed a patch to MR-279 branch. But let's leave this open so that I can verify the fix and if possible look at the tests. Changing the title to reflect the same. Finish up the cleanup of distributed cache file resources and related tests. Key: MAPREDUCE-2691 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2691 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Amol Kekre Assignee: Vinod Kumar Vavilapalli Fix For: 0.23.0 Implement cleanup of distributed cache file resources -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2708) Design and implement MR Application Master recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sharad Agarwal updated MAPREDUCE-2708: -- Component/s: mrv2 Design and implement MR Application Master recovery --- Key: MAPREDUCE-2708 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Reporter: Sharad Agarwal Assignee: Sharad Agarwal Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2708) Design and implement MR Application Master recovery
Design and implement MR Application Master recovery --- Key: MAPREDUCE-2708 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2708 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Sharad Agarwal Assignee: Sharad Agarwal Design recovery of MR AM from crashes/node failures. The running job should recover from the state it left off. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2589) TaskTracker not purging userlog directories
[ https://issues.apache.org/jira/browse/MAPREDUCE-2589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067516#comment-13067516 ] Devaraj K commented on MAPREDUCE-2589: -- One improvement can be done in the patch, now for every file in the user log directory it is getting the jobs which are to be completed every time and checking. Instead of this it can get the jobs list once and can check for all the files in the user log directory whether it belongs to running job or not. TaskTracker not purging userlog directories --- Key: MAPREDUCE-2589 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2589 Project: Hadoop Map/Reduce Issue Type: Bug Components: tasktracker Affects Versions: 0.20.205.0 Environment: 0.20.205 Reporter: Sherry Chen Assignee: Sherry Chen Priority: Minor Fix For: 0.20.205.0 Attachments: MAPREDUCE-2589.patch, cleanup_userlogs.py UserLogCleaner is not robust. Leftover userlogs after a restart sometimes have to be manually cleaned. Things can accumulate over a period of time. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira