[
https://issues.apache.org/jira/browse/HADOOP-719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465614
]
Doug Cutting commented on HADOOP-719:
-------------------------------------
> Would this be a good candidate to check into hadoop contrib?
It sounds useful to me. I can't see why not. Contrib sounds appropriate.
> Integration of Hadoop with batch schedulers
> -------------------------------------------
>
> Key: HADOOP-719
> URL: https://issues.apache.org/jira/browse/HADOOP-719
> Project: Hadoop
> Issue Type: New Feature
> Components: contrib/streaming
> Reporter: Mahadev konar
> Assigned To: Mahadev konar
>
> Hadoop On Demand (HOD) is an integration of Hadoop with batch schedulers like
> Condor/torque/sun grid etc. Hadoop On Demand or HOD hereafter is a system
> that populates a Hadoop instance using a shared batch scheduler. HOD will
> find a requested number of nodes and start up Hadoop daemons on them. Users
> map reduce jobs can then run on the hadoop instance. After the job is done,
> HOD gives back the nodes to the shared batch scheduler. A group of users
> will use HOD to acquire Hadoop instances of varying sizes and the batch
> scheduler will schedule requests in a way that important jobs gain more
> importance/resources and finish fast. Here are a list of requirements for HOD
> and batch schedulers:
> Key Requirements :
> --- Should allocate the specified minimum number of nodes for a job
> Many batch jobs can finish in time, only when enough resources are
> allocated. Therefore batch scheduler should allocate the asked number of
> nodes for a given job when the job starts. This is simple form of what's
> known as gang scheduling.
> Often the minimum nodes are not available right away, especially if the job
> asked for a large number. The batch scheduler should support advance
> reservation for important jobs so that the wait time can be determined. In
> advance reservation, a reservation is created on earliest future point when
> the preoccupied nodes become available. When nodes are currently idle but
> booked by future reservations, batch scheduler is ok to give them to other
> jobs to increase system utilization, but only when doing so does not delay
> existing reservations.
> --- run short urgent job without costing too much loss to long job.
> Especially, should not kill job tracker of long job.
> Some jobs, mostly short ones, are time sensitive and need urgent treatment.
> Often, large portion of cluster nodes will be occupied by long running jobs.
> Batch scheduler should be able to preempt long jobs and run urgent jobs.
> Then, urgent jobs will finish quickly and long jobs can re-gain the nodes
> afterward.
> When preemption happens, HOD should minimize the loss to long jobs.
> Especially, it should not kill job tracker of long job.
> --- be able to dial up, at run time, share of resources for more important
> projects.
> Viewed at high level, a given cluster is shared by multiple projects. A
> project consists of a number of jobs submitted by a group of users.Batch
> scheduler should allow important projects to have more resources. This should
> be tunable at run time as what projects deem more important may change over
> time.
> --- prevent malicious abuse of the system.
> A shared cluster environment can be put in jeopardy if malicious or
> erroneous job code does:
> -- hold unneeded resources for a long period
> -- use privileges for unworthy work
> Such abuse can easily cause under-utilization or starvation of other jobs.
> Batch scheduler should allow setting up policies for preventing resource
> abuse by:
> -- limit privileges to legitimate uses asking for proper amount
> -- throttle peak use of resources per player
> -- monitor and reduce starvation
> --- The behavior should be simple and predictable
> When status of the system is queried, we should be able to determine what
> factors caused it to reach current status and what could be the future
> behavior with or without our tuning on the system.
> --- be portable to major resource managers
> HOD design should be portable so that in future we are able to plugin
> other resource manager.
> Some of the key requirements are implemented by the batch schedulers. The
> others need to be implemented by HOD.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira