[jira] Updated: (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordà Polo updated MAPREDUCE-1380: -- Attachment: MAPREDUCE-1380_1.1.patch Patch against trunk. Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995408#comment-12995408 ] Jordà Polo commented on MAPREDUCE-1380: --- I'm sending a new version of the Adaptive Scheduler. This new version is actually a new implementation with a different architecture roughly described in the attached PDF document. It supports the same features as the previous version, but at the same time provides new features and a framework for future improvements. The new features are mostly focused on making the scheduler more aware of the resources and allowing a dynamic number of running tasks depending on the jobs and their need for resources (instead of a fixed number of slots). It is still a work in progress and requires some additional tuning, but I thought it would be interesting to publish it as it is now given some of the ideas that have been proposed for Hadoop MapReduce NextGen (MAPREDUCE-279). The scheduler currently leverages job profiling information to ensure optimal cluster utilization, but our goal is to get rid of this kind of profiles and implement a more dynamic approach (e.g. using resource information data introduced by MAPREDUCE-1218). I still don't know what's the status of the NextGen proposal and its implementation. But as soon as more details about NextGen are revealed we'll see whether it makes sense and it is worth/useful to adapt or use some of the ideas in the new Hadoop MapReduce architecture. Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-1380) Adaptive Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordà Polo updated MAPREDUCE-1380: -- Attachment: MAPREDUCE-1380_1.1.pdf Adaptive Scheduler -- Key: MAPREDUCE-1380 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380 Project: Hadoop Map/Reduce Issue Type: New Feature Reporter: Jordà Polo Priority: Minor Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically adjusts the amount of used resources depending on the performance of jobs and on user-defined high-level business goals. Existing Hadoop schedulers are focused on managing large, static clusters in which nodes are added or removed manually. On the other hand, the goal of this scheduler is to improve the integration of Hadoop and the applications that run on top of it with environments that allow a more dynamic provisioning of resources. The current implementation is quite straightforward. Users specify a deadline at job submission time, and the scheduler adjusts the resources to meet that deadline (at the moment, the scheduler can be configured to either minimize or maximize the amount of resources). If multiple jobs are run simultaneously, the scheduler prioritizes them by deadline. Note that the current approach to estimate the completion time of jobs is quite simplistic: it is based on the time it takes to finish each task, so it works well with regular jobs, but there is still room for improvement for unpredictable jobs. The idea is to further integrate it with cloud-like and virtual environments (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't able to meet its deadline, the scheduler automatically requests more resources. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2178) Race condition in LinuxTaskController permissions handling
[ https://issues.apache.org/jira/browse/MAPREDUCE-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-2178: --- Attachment: mr-2178-error-on-launch-fail.txt Another fix based on branch 20 patch - if the taskjvm.sh fails to write, it currently swallows that exception without printing it to logs or anything. Ideally it would become part of the diagnostic info for the task, but this small patch is a big improvement for diagnosability. Race condition in LinuxTaskController permissions handling -- Key: MAPREDUCE-2178 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2178 Project: Hadoop Map/Reduce Issue Type: Bug Components: security, task-controller Affects Versions: 0.22.0 Reporter: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: 0001-Amend-MAPREDUCE-2178.-Fix-racy-check-for-config-file.patch, 0002-Amend-MAPREDUCE-2178.-Check-argc-after-checks-for-pe.patch, 0003-Amend-MAPREDUCE-2178.-Check-result-of-chdir.patch, ac-sys-largefile.patch, mr-2178-error-on-launch-fail.txt, mr-2178-y20-sortof.patch The linux-task-controller executable currently traverses a directory heirarchy and calls chown/chmod on the files inside. There is a race condition here which can be exploited by an attacker, causing the task-controller to improprly chown an arbitrary target file (via a symlink) to the user running a MR job. This can be exploited to escalate to root. [this issue was raised and discussed on the security@ list over the last couple of months] -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (MAPREDUCE-2333) RAID jobs should delete temporary files in the event of filesystem failures
RAID jobs should delete temporary files in the event of filesystem failures --- Key: MAPREDUCE-2333 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2333 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/raid Reporter: Ramkumar Vadali Assignee: Ramkumar Vadali Priority: Minor If the creation of a parity file or parity file HAR fails due to a filesystem level error, RAID should delete the temporary files. Specifically, datanode death during parity file creation would cause FSDataOutputStream.close() to throw an IOException. The RAID code should delete such a file. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2332) Improve error messages when MR dirs on local FS have bad ownership
[ https://issues.apache.org/jira/browse/MAPREDUCE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-2332: --- Status: Patch Available (was: Open) Improve error messages when MR dirs on local FS have bad ownership -- Key: MAPREDUCE-2332 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2332 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 0.23.0 Attachments: mr-2332.txt A common source of user difficulty on a secure cluster is understanding which paths should be owned by which users. The task log directory in particular is often missed, since it has to be owned by mapred but may be inside a logs dir which has different ownership. Right now the user has to spelunk in the code to understand the exception they get if this dir has bad ownership. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord
[ https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-2327: --- Attachment: mr-2327.txt MapTask doesn't need to put username information in SpillRecord --- Key: MAPREDUCE-2327 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Attachments: mr-2327.txt This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord
[ https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-2327: --- Status: Patch Available (was: Open) MapTask doesn't need to put username information in SpillRecord --- Key: MAPREDUCE-2327 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: mr-2327.txt This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch. This bug causes task failures in the following case: - Cluster is not set up with LinuxTaskController (ie not secured cluster) - Job submitter is not the same as the user running the TT - Map output is more than one spill's worth The issue is that UserGroupInformation's view of the current user is the job submitter, but on disk the spill files will be owned by the TT user. SecureIO will then fail when constructing the spill record. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord
[ https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated MAPREDUCE-2327: --- Description: This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch. This bug causes task failures in the following case: - Cluster is not set up with LinuxTaskController (ie not secured cluster) - Job submitter is not the same as the user running the TT - Map output is more than one spill's worth The issue is that UserGroupInformation's view of the current user is the job submitter, but on disk the spill files will be owned by the TT user. SecureIO will then fail when constructing the spill record. was:This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch. Priority: Blocker (was: Major) Fix Version/s: 0.22.0 Assignee: Todd Lipcon MapTask doesn't need to put username information in SpillRecord --- Key: MAPREDUCE-2327 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker Fix For: 0.22.0 Attachments: mr-2327.txt This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch. This bug causes task failures in the following case: - Cluster is not set up with LinuxTaskController (ie not secured cluster) - Job submitter is not the same as the user running the TT - Map output is more than one spill's worth The issue is that UserGroupInformation's view of the current user is the job submitter, but on disk the spill files will be owned by the TT user. SecureIO will then fail when constructing the spill record. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira