If that is only for benchmarking, you could stop the task-trackers on the
machines you don't want to use.
Or you could setup another cluster.

But yes, there is not standard way to limit the slots taken by a job to a
specified set of machines.
You might be able to do it using a custom Scheduler but that would be out
of your scope, I guess.

Regards

Bertrand

On Mon, Sep 10, 2012 at 12:01 PM, Hemanth Yamijala <yhema...@gmail.com>wrote:

> Hi,
>
> I am not sure if there's any way to restrict the tasks to specific
> machines. However, I think there are some ways of restricting to
> number of 'slots' that can be used by the job.
>
> Also, not sure which version of Hadoop you are on. The
> capacityscheduler
> (
> http://hadoop.apache.org/common/docs/r2.0.1-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
> )
> has ways by which you can set up a queue with a hard capacity limit.
> The capacity controls the number of slots that that can be used by
> jobs submitted to the queue. So, if you submit a job to the queue,
> irrespective of the number of tasks it has, it should limit it to
> those slots.  However, please note that this does not restrict the
> tasks to specific machines.
>
> Thanks
> Hemanth
>
> On Mon, Sep 10, 2012 at 2:36 PM, Safdar Kureishy
> <safdar.kurei...@gmail.com> wrote:
> > Hi,
> >
> > I need to run some benchmarking tests for a given mapreduce job on a
> *subset
> > *of a 10-node Hadoop cluster. Not that it matters, but the current
> cluster
> > settings allow for ~20 map slots and 10 reduce slots per node.
> >
> > Without loss of generalization, let's say I want a job with these
> > constraints below:
> > - to use only *5* out of the 10 nodes for running the mappers,
> > - to use only *5* out of the 10 nodes for running the reducers.
> >
> > Is there any other way of achieving this through Hadoop property
> overrides
> > during job-submission time? I understand that the Fair Scheduler can
> > potentially be used to create pools of a proportionate # of mappers and
> > reducers, to achieve a similar outcome, but the problem is that I still
> > cannot tie such a pool to a fixed # of machines (right?). Essentially,
> > regardless of the # of map/reduce tasks involved, I only want a *fixed #
> of
> > machines* to handle the job.
> >
> > Any tips on how I can go about achieving this?
> >
> > Thanks,
> > Safdar
>



-- 
Bertrand Dechoux

Reply via email to