If that is only for benchmarking, you could stop the task-trackers on the machines you don't want to use. Or you could setup another cluster.
But yes, there is not standard way to limit the slots taken by a job to a specified set of machines. You might be able to do it using a custom Scheduler but that would be out of your scope, I guess. Regards Bertrand On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <yhema...@gmail.com>wrote: > Hi, > > I am not sure if there's any way to restrict the tasks to specific > machines. However, I think there are some ways of restricting to > number of 'slots' that can be used by the job. > > Also, not sure which version of Hadoop you are on. The > capacityscheduler > ( > http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html > ) > has ways by which you can set up a queue with a hard capacity limit. > The capacity controls the number of slots that that can be used by > jobs submitted to the queue. So, if you submit a job to the queue, > irrespective of the number of tasks it has, it should limit it to > those slots. However, please note that this does not restrict the > tasks to specific machines. > > Thanks > Hemanth > > On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy > <safdar.kurei...@gmail.com> wrote: > > Hi, > > > > I need to run some benchmarking tests for a given mapreduce job on a > *subset > > *of a 10-node Hadoop cluster. Not that it matters, but the current > cluster > > settings allow for ~20 map slots and 10 reduce slots per node. > > > > Without loss of generalization, let's say I want a job with these > > constraints below: > > - to use only *5* out of the 10 nodes for running the mappers, > > - to use only *5* out of the 10 nodes for running the reducers. > > > > Is there any other way of achieving this through Hadoop property > overrides > > during job-submission time? I understand that the Fair Scheduler can > > potentially be used to create pools of a proportionate # of mappers and > > reducers, to achieve a similar outcome, but the problem is that I still > > cannot tie such a pool to a fixed # of machines (right?). Essentially, > > regardless of the # of map/reduce tasks involved, I only want a *fixed # > of > > machines* to handle the job. > > > > Any tips on how I can go about achieving this? > > > > Thanks, > > Safdar > -- Bertrand Dechoux