Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-25 Thread Renaud Delbru

Hi,

we would like to limit the number of maximum tasks per job on our hadoop 
0.20.2 cluster.
Is the Capacity Scheduler [1] will allow to do this ? Is it correctly 
working on hadoop 0.20.2 (I remember a  few months ago, we were looking 
at it, but it seemed incompatible with hadoop 0.20.2).


[1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html

Regards,
--
Renaud Delbru


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-25 Thread Harsh J
Capacity Scheduler (or a version of it) does ship with the 0.20
release of Hadoop and is usable. It can be used to assign queues with
a limited capacity for each, which your jobs must appropriately submit
to if you want them to utilize only the assigned fraction of your
cluster for its processing.

On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru  wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our hadoop
> 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
> working on hadoop 0.20.2 (I remember a  few months ago, we were looking at
> it, but it seemed incompatible with hadoop 0.20.2).
>
> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,
> --
> Renaud Delbru
>



-- 
Harsh J
www.harshj.com


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-25 Thread Renaud Delbru
Our experience with the Capacity Scheduler was not what we expected and 
what you describe. But, it might be due to a wrong comprehension of the 
configuration parameters.

The problem is the following:
mapred.capacity-scheduler.queue..capacity: Percentage of the 
number of slots in the cluster that are *guaranteed* to be available for 
jobs in this queue.
mapred.capacity-scheduler.queue..minimum-user-limit-percent: 
Each queue enforces a limit on the percentage of resources allocated to 
a user at any given time, if *there is competition for them*.


So, in fact, it seems that if there is no competition, and that the 
cluster is fully available, the scheduler will assign the full cluster 
to the job, and will not limit the number of concurrent tasks. It seemed 
to us that the only way to enforce a strong limit was to use the Fair 
Scheduler of hadoop 0.21.0 which includes a new configuration parameters 
'maxMaps'.


Am I right, or did we miss something ?

cheers
--
Renaud Delbru

On 25/01/11 15:20, Harsh J wrote:

Capacity Scheduler (or a version of it) does ship with the 0.20
release of Hadoop and is usable. It can be used to assign queues with
a limited capacity for each, which your jobs must appropriately submit
to if you want them to utilize only the assigned fraction of your
cluster for its processing.

On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru  wrote:

Hi,

we would like to limit the number of maximum tasks per job on our hadoop
0.20.2 cluster.
Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
working on hadoop 0.20.2 (I remember a  few months ago, we were looking at
it, but it seemed incompatible with hadoop 0.20.2).

[1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html

Regards,
--
Renaud Delbru








Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-25 Thread Harsh J
No, that is right. I did not assume that it was a very strict slot
limit you were looking to impose for your jobs.

On Tue, Jan 25, 2011 at 9:27 PM, Renaud Delbru  wrote:
> Our experience with the Capacity Scheduler was not what we expected and what
> you describe. But, it might be due to a wrong comprehension of the
> configuration parameters.
> The problem is the following:
> mapred.capacity-scheduler.queue..capacity: Percentage of the
> number of slots in the cluster that are *guaranteed* to be available for
> jobs in this queue.
> mapred.capacity-scheduler.queue..minimum-user-limit-percent:
> Each queue enforces a limit on the percentage of resources allocated to a
> user at any given time, if *there is competition for them*.
>
> So, in fact, it seems that if there is no competition, and that the cluster
> is fully available, the scheduler will assign the full cluster to the job,
> and will not limit the number of concurrent tasks. It seemed to us that the
> only way to enforce a strong limit was to use the Fair Scheduler of hadoop
> 0.21.0 which includes a new configuration parameters 'maxMaps'.
>
> Am I right, or did we miss something ?
>
> cheers
> --
> Renaud Delbru
>
> On 25/01/11 15:20, Harsh J wrote:
>>
>> Capacity Scheduler (or a version of it) does ship with the 0.20
>> release of Hadoop and is usable. It can be used to assign queues with
>> a limited capacity for each, which your jobs must appropriately submit
>> to if you want them to utilize only the assigned fraction of your
>> cluster for its processing.
>>
>> On Tue, Jan 25, 2011 at 5:19 PM, Renaud Delbru
>>  wrote:
>>>
>>> Hi,
>>>
>>> we would like to limit the number of maximum tasks per job on our hadoop
>>> 0.20.2 cluster.
>>> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
>>> working on hadoop 0.20.2 (I remember a  few months ago, we were looking
>>> at
>>> it, but it seemed incompatible with hadoop 0.20.2).
>>>
>>> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>>>
>>> Regards,
>>> --
>>> Renaud Delbru
>>>
>>
>>
>
>



-- 
Harsh J
www.harshj.com


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-25 Thread Renaud Delbru
As it seems that the capacity and fair schedulers in hadoop 0.20.2 do 
not allow a hard upper limit in number of concurrent tasks, do anybody 
know any other solutions to achieve this ?

--
Renaud Delbru

On 25/01/11 11:49, Renaud Delbru wrote:

Hi,

we would like to limit the number of maximum tasks per job on our 
hadoop 0.20.2 cluster.
Is the Capacity Scheduler [1] will allow to do this ? Is it correctly 
working on hadoop 0.20.2 (I remember a  few months ago, we were 
looking at it, but it seemed incompatible with hadoop 0.20.2).


[1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html

Regards,




Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-26 Thread Koji Noguchi
Hi Renaud,

Hopefully it'll be in 0.20-security branch that Arun is trying to push.

Related (very abstract) Jira.
https://issues.apache.org/jira/browse/MAPREDUCE-1872

Koji



On 1/25/11 12:48 PM, "Renaud Delbru"  wrote:

As it seems that the capacity and fair schedulers in hadoop 0.20.2 do
not allow a hard upper limit in number of concurrent tasks, do anybody
know any other solutions to achieve this ?
--
Renaud Delbru

On 25/01/11 11:49, Renaud Delbru wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our
> hadoop 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
> working on hadoop 0.20.2 (I remember a  few months ago, we were
> looking at it, but it seemed incompatible with hadoop 0.20.2).
>
> [1] http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,




Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-27 Thread Renaud Delbru

Hi Koji,

thanks for sharing the information,
Is the 0.20-security branch planned to be a official release at some point ?

Cheers
--
Renaud Delbru

On 27/01/11 01:50, Koji Noguchi wrote:

Hi Renaud,

Hopefully it’ll be in 0.20-security branch that Arun is trying to push.

Related (very abstract) Jira.
https://issues.apache.org/jira/browse/MAPREDUCE-1872

Koji



On 1/25/11 12:48 PM, "Renaud Delbru"  wrote:

As it seems that the capacity and fair schedulers in hadoop 0.20.2 do
not allow a hard upper limit in number of concurrent tasks, do anybody
know any other solutions to achieve this ?
--
Renaud Delbru

On 25/01/11 11:49, Renaud Delbru wrote:
> Hi,
>
> we would like to limit the number of maximum tasks per job on our
> hadoop 0.20.2 cluster.
> Is the Capacity Scheduler [1] will allow to do this ? Is it correctly
> working on hadoop 0.20.2 (I remember a few months ago, we were
> looking at it, but it seemed incompatible with hadoop 0.20.2).
>
> [1]
http://hadoop.apache.org/common/docs/r0.20.2/capacity_scheduler.html
>
> Regards,






Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-27 Thread Steve Loughran

On 27/01/11 10:51, Renaud Delbru wrote:

Hi Koji,

thanks for sharing the information,
Is the 0.20-security branch planned to be a official release at some
point ?

Cheers


If you can play with the beta you can see that it works for you and if 
not, get bugs fixed during the beta cycle


http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/


Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-27 Thread Renaud Delbru

Thanks, we will try to test it next week.
--
Renaud Delbru

On 27/01/11 11:31, Steve Loughran wrote:

On 27/01/11 10:51, Renaud Delbru wrote:

Hi Koji,

thanks for sharing the information,
Is the 0.20-security branch planned to be a official release at some
point ?

Cheers


If you can play with the beta you can see that it works for you and if 
not, get bugs fixed during the beta cycle


http://people.apache.org/~acmurthy/hadoop-0.20.100-rc0/




Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-28 Thread Allen Wittenauer

On Jan 25, 2011, at 12:48 PM, Renaud Delbru wrote:

> As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not 
> allow a hard upper limit in number of concurrent tasks, do anybody know any 
> other solutions to achieve this ?

The specific change for capacity scheduler has been backported to 0.20.2 as 
part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 .  Note that 
you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which 
fixes a logging bug in the JobTracker.  Otherwise your logs will fill up.



Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-29 Thread Renaud Delbru

Hi Allen,

thanks for pointing this out.

On 28/01/11 17:34, Allen Wittenauer wrote:

As it seems that the capacity and fair schedulers in hadoop 0.20.2 do not allow 
a hard upper limit in number of concurrent tasks, do anybody know any other 
solutions to achieve this ?

The specific change for capacity scheduler has been backported to 0.20.2 as 
part of https://issues.apache.org/jira/browse/MAPREDUCE-1105 .  Note that 
you'll also need https://issues.apache.org/jira/browse/MAPREDUCE-1160 which 
fixes a logging bug in the JobTracker.  Otherwise your logs will fill up.

--
Renaud Delbru