Control Resources / Cores assigned for a job
Hi, I want to be able to control the number of nodes assigned to a MR job on the cluster. For example, I want the job to not execute more than 10 Mappers at a time, irrespective of whether there are more nodes available or not. I don't wish to control the number of mappers that are created by the job. The number of mappers is tied to the problem size / input data and on my input splits. I want it to remain that way. Is this possible? If so, what's the best way to do this? Deepak
Re: Control Resources / Cores assigned for a job
Fair scheduler? Sent from a remote device. Please excuse any typos... Mike Segel On Mar 16, 2012, at 5:54 PM, Deepak Nettem wrote: > Hi, > > I want to be able to control the number of nodes assigned to a MR job on > the cluster. For example, I want the job to not execute more than 10 > Mappers at a time, irrespective of whether there are more nodes available > or not. > > I don't wish to control the number of mappers that are created by the job. > The number of mappers is tied to the problem size / input data and on my > input splits. I want it to remain that way. > > Is this possible? If so, what's the best way to do this? > > Deepak
Re: Control Resources / Cores assigned for a job
Deepak, Michael is right, you will need FairScheduler's maxMaps/maxReduces for controlling concurrency for some specific jobs (via pools): http://hadoop.apache.org/common/docs/current/fair_scheduler.html#Allocation+File+%28fair-scheduler.xml%29 Another option is to use and configure CapacityTaskSchedulers' maximum cluster capacity for a defined queue: http://hadoop.apache.org/common/docs/current/capacity_scheduler.html#Resource+allocation On Sat, Mar 17, 2012 at 6:33 AM, Michel Segel wrote: > Fair scheduler? > > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Mar 16, 2012, at 5:54 PM, Deepak Nettem wrote: > >> Hi, >> >> I want to be able to control the number of nodes assigned to a MR job on >> the cluster. For example, I want the job to not execute more than 10 >> Mappers at a time, irrespective of whether there are more nodes available >> or not. >> >> I don't wish to control the number of mappers that are created by the job. >> The number of mappers is tied to the problem size / input data and on my >> input splits. I want it to remain that way. >> >> Is this possible? If so, what's the best way to do this? >> >> Deepak -- Harsh J