Control Resources / Cores assigned for a job

2012-03-16 Thread Deepak Nettem
Hi,

I want to be able to control the number of nodes assigned to a MR job on
the cluster. For example, I want the job to not execute more than 10
Mappers at a time, irrespective of whether there are more nodes available
or not.

I don't wish to control the number of mappers that are created by the job.
The number of mappers is tied to the problem size / input data and on my
input splits. I want it to remain that way.

Is this possible? If so, what's the best way to do this?

Deepak


Re: Control Resources / Cores assigned for a job

2012-03-16 Thread Michel Segel
Fair scheduler?


Sent from a remote device. Please excuse any typos...

Mike Segel

On Mar 16, 2012, at 5:54 PM, Deepak Nettem  wrote:

> Hi,
> 
> I want to be able to control the number of nodes assigned to a MR job on
> the cluster. For example, I want the job to not execute more than 10
> Mappers at a time, irrespective of whether there are more nodes available
> or not.
> 
> I don't wish to control the number of mappers that are created by the job.
> The number of mappers is tied to the problem size / input data and on my
> input splits. I want it to remain that way.
> 
> Is this possible? If so, what's the best way to do this?
> 
> Deepak


Re: Control Resources / Cores assigned for a job

2012-03-16 Thread Harsh J
Deepak,

Michael is right, you will need FairScheduler's maxMaps/maxReduces for
controlling concurrency for some specific jobs (via pools):
http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29

Another option is to use and configure CapacityTaskSchedulers' maximum
cluster capacity for a defined queue:
http://hadoop.apache.org/common/docs/current/capacity_scheduler.html#Resource+allocation

On Sat, Mar 17, 2012 at 6:33 AM, Michel Segel  wrote:
> Fair scheduler?
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Mar 16, 2012, at 5:54 PM, Deepak Nettem  wrote:
>
>> Hi,
>>
>> I want to be able to control the number of nodes assigned to a MR job on
>> the cluster. For example, I want the job to not execute more than 10
>> Mappers at a time, irrespective of whether there are more nodes available
>> or not.
>>
>> I don't wish to control the number of mappers that are created by the job.
>> The number of mappers is tied to the problem size / input data and on my
>> input splits. I want it to remain that way.
>>
>> Is this possible? If so, what's the best way to do this?
>>
>> Deepak



-- 
Harsh J