[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801532#action_12801532
 ] 

Jordà Polo commented on MAPREDUCE-1380:
---------------------------------------

{quote}
Can you please expand on your definition of 'regular' jobs? Are these, for e.g. 
part of regular workflows? IAC, how do you propose to communicate this 
information to the AdaptiveScheduler?
{quote}

Actually, "regular" isn't really appropriate here, thanks for pointing that out.

I actually meant uniform or homogeneous jobs, that is, jobs in which all the 
tasks take approximately the same amount of time to finish. It would be 
interesting to communicate some additional data, but so far it only uses 
standard information as provided by tasktrackers.

> Adaptive Scheduler
> ------------------
>
>                 Key: MAPREDUCE-1380
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Jordà Polo
>            Priority: Minor
>
> The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
> adjusts the amount of used resources depending on the performance of jobs and 
> on user-defined high-level business goals.
> Existing Hadoop schedulers are focused on managing large, static clusters in 
> which nodes are added or removed manually. On the other hand, the goal of 
> this scheduler is to improve the integration of Hadoop and the applications 
> that run on top of it with environments that allow a more dynamic 
> provisioning of resources.
> The current implementation is quite straightforward. Users specify a deadline 
> at job submission time, and the scheduler adjusts the resources to meet that 
> deadline (at the moment, the scheduler can be configured to either minimize 
> or maximize the amount of resources). If multiple jobs are run 
> simultaneously, the scheduler prioritizes them by deadline. Note that the 
> current approach to estimate the completion time of jobs is quite simplistic: 
> it is based on the time it takes to finish each task, so it works well with 
> regular jobs, but there is still room for improvement for unpredictable jobs.
> The idea is to further integrate it with cloud-like and virtual environments 
> (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
> able to meet its deadline, the scheduler automatically requests more 
> resources.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to