[jira] Updated: (MAPREDUCE-1380) Adaptive Scheduler

2011-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordà Polo updated MAPREDUCE-1380:
--

Attachment: MAPREDUCE-1380_1.1.patch

Patch against trunk.

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1380) Adaptive Scheduler

2011-02-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995408#comment-12995408
 ] 

Jordà Polo commented on MAPREDUCE-1380:
---

I'm sending a new version of the Adaptive Scheduler.

This new version is actually a new implementation with a different architecture 
roughly described in the attached PDF document. It supports the same features 
as the previous version, but at the same time provides new features and a 
framework for future improvements.

The new features are mostly focused on making the scheduler more aware of the 
resources and allowing a dynamic number of running tasks depending on the jobs 
and their need for resources (instead of a fixed number of slots).

It is still a work in progress and requires some additional tuning, but I 
thought it would be interesting to publish it as it is now given some of the 
ideas that have been proposed for Hadoop MapReduce NextGen (MAPREDUCE-279). The 
scheduler currently leverages job profiling information to ensure optimal 
cluster utilization, but our goal is to get rid of this kind of profiles and 
implement a more dynamic approach (e.g. using resource information data 
introduced by MAPREDUCE-1218).

I still don't know what's the status of the NextGen proposal and its 
implementation. But as soon as more details about NextGen are revealed we'll 
see whether it makes sense and it is worth/useful to adapt or use some of the 
ideas in the new Hadoop MapReduce architecture.


 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-1380) Adaptive Scheduler

2011-02-16 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordà Polo updated MAPREDUCE-1380:
--

Attachment: MAPREDUCE-1380_1.1.pdf

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, 
 MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2178) Race condition in LinuxTaskController permissions handling

2011-02-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2178:
---

Attachment: mr-2178-error-on-launch-fail.txt

Another fix based on branch 20 patch - if the taskjvm.sh fails to write, it 
currently swallows that exception without printing it to logs or anything.

Ideally it would become part of the diagnostic info for the task, but this 
small patch is a big improvement for diagnosability.

 Race condition in LinuxTaskController permissions handling
 --

 Key: MAPREDUCE-2178
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2178
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, task-controller
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: 
 0001-Amend-MAPREDUCE-2178.-Fix-racy-check-for-config-file.patch, 
 0002-Amend-MAPREDUCE-2178.-Check-argc-after-checks-for-pe.patch, 
 0003-Amend-MAPREDUCE-2178.-Check-result-of-chdir.patch, 
 ac-sys-largefile.patch, mr-2178-error-on-launch-fail.txt, 
 mr-2178-y20-sortof.patch


 The linux-task-controller executable currently traverses a directory 
 heirarchy and calls chown/chmod on the files inside. There is a race 
 condition here which can be exploited by an attacker, causing the 
 task-controller to improprly chown an arbitrary target file (via a symlink) 
 to the user running a MR job. This can be exploited to escalate to root.
 [this issue was raised and discussed on the security@ list over the last 
 couple of months]

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2333) RAID jobs should delete temporary files in the event of filesystem failures

2011-02-16 Thread Ramkumar Vadali (JIRA)
RAID jobs should delete temporary files in the event of filesystem failures
---

 Key: MAPREDUCE-2333
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2333
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


If the creation of a parity file or parity file HAR fails due to a filesystem 
level error, RAID should delete the temporary files. Specifically, datanode 
death during parity file creation would cause FSDataOutputStream.close() to 
throw an IOException. The RAID code should delete such a file.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2332) Improve error messages when MR dirs on local FS have bad ownership

2011-02-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2332:
---

Status: Patch Available  (was: Open)

 Improve error messages when MR dirs on local FS have bad ownership
 --

 Key: MAPREDUCE-2332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2332
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.23.0

 Attachments: mr-2332.txt


 A common source of user difficulty on a secure cluster is understanding which 
 paths should be owned by which users. The task log directory in particular is 
 often missed, since it has to be owned by mapred but may be inside a logs dir 
 which has different ownership. Right now the user has to spelunk in the code 
 to understand the exception they get if this dir has bad ownership.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord

2011-02-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2327:
---

Attachment: mr-2327.txt

 MapTask doesn't need to put username information in SpillRecord
 ---

 Key: MAPREDUCE-2327
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
 Attachments: mr-2327.txt


 This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 
 branch.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord

2011-02-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2327:
---

Status: Patch Available  (was: Open)

 MapTask doesn't need to put username information in SpillRecord
 ---

 Key: MAPREDUCE-2327
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: mr-2327.txt


 This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 
 branch.
 This bug causes task failures in the following case:
 - Cluster is not set up with LinuxTaskController (ie not secured cluster)
 - Job submitter is not the same as the user running the TT
 - Map output is more than one spill's worth
 The issue is that UserGroupInformation's view of the current user is the job 
 submitter, but on disk the spill files will be owned by the TT user. SecureIO 
 will then fail when constructing the spill record.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2327) MapTask doesn't need to put username information in SpillRecord

2011-02-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2327:
---

  Description: 
This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 branch.

This bug causes task failures in the following case:
- Cluster is not set up with LinuxTaskController (ie not secured cluster)
- Job submitter is not the same as the user running the TT
- Map output is more than one spill's worth

The issue is that UserGroupInformation's view of the current user is the job 
submitter, but on disk the spill files will be owned by the TT user. SecureIO 
will then fail when constructing the spill record.

  was:This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 
branch.

 Priority: Blocker  (was: Major)
Fix Version/s: 0.22.0
 Assignee: Todd Lipcon

 MapTask doesn't need to put username information in SpillRecord
 ---

 Key: MAPREDUCE-2327
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2327
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: mr-2327.txt


 This is an amendment to MAPREDUCE-2096 that's found in Yahoo's 0.20.100 
 branch.
 This bug causes task failures in the following case:
 - Cluster is not set up with LinuxTaskController (ie not secured cluster)
 - Job submitter is not the same as the user running the TT
 - Map output is more than one spill's worth
 The issue is that UserGroupInformation's view of the current user is the job 
 submitter, but on disk the spill files will be owned by the TT user. SecureIO 
 will then fail when constructing the spill record.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira