Our experience with the Capacity Scheduler was not what we expected and what you describe. But, it might be due to a wrong comprehension of the configuration parameters.
The problem is the following:
mapred.capacity-scheduler.queue.<queue-name>.capacity: Percentage of the number of slots in the cluster that are *guaranteed* to be available for jobs in this queue. mapred.capacity-scheduler.queue.<queue-name>.minimum-user-limit-percent: Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if *there is competition for them*.

So, in fact, it seems that if there is no competition, and that the cluster is fully available, the scheduler will assign the full cluster to the job, and will not limit the number of concurrent tasks. It seemed to us that the only way to enforce a strong limit was to use the Fair Scheduler of hadoop 0.21.0 which includes a new configuration parameters 'maxMaps'.

Am I right, or did we miss something ?

cheers
--
Renaud Delbru

On 25/01/11 15:20, Harsh J wrote:
Capacity Scheduler (or a version of it) does ship with the 0.20
release of Hadoop and is usable. It can be used to assign queues with
a limited capacity for each, which your jobs must appropriately submit
to if you want them to utilize only the assigned fraction of your
cluster for its processing.

On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru<renaud.del...@deri.org>  wrote:
Hi,

we would like to limit the number of maximum tasks per job on our hadoop
0.20.2 cluster.
Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
working on hadoop 0.20.2 (I remember a  few months ago, we were looking at
it, but it seemed incompatible with hadoop 0.20.2).

[1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html

Regards,
--
Renaud Delbru




Reply via email to