Oh wow.. I didn't know that..
Actually for me datanodes/tasktrackers are running on same machines.
I mention datanodes because if I delete those machines from masters list,
chances are the data will also loose.
So I don't want to do that..
but now I guess by stoping tasktrackers individually... I can decrease the
strength of my cluster by decreasing the number of nodes that will run
tasktracker .. right ?? This  way I won't loose my data also.. Right ??



On Wed, Sep 21, 2011 at 6:39 PM, Harsh J <ha...@cloudera.com> wrote:

> Praveenesh,
>
> TaskTrackers run your jobs' tasks for you, not DataNodes directly. So
> you can statically control loads on nodes by removing away
> TaskTrackers from your cluster.
>
> i.e, if you "service hadoop-0.20-tasktracker stop" or
> "hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't
> run there anymore.
>
> Is this what you're looking for?
>
> (There are ways to achieve the exclusion dynamically, by writing a
> scheduler, but hard to tell without knowing what you need
> specifically, and why do you require it?)
>
> On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar <praveen...@gmail.com>
> wrote:
> > Is there any way that we can run a particular job in a hadoop on subset
> of
> > datanodes ?
> >
> > My problem is I don't want to use all the nodes to run some job,
> > I am trying to make Job completion Vs No. of nodes graph for a particular
> > job.
> > One way to do is I can remove datanodes, and then see how much time the
> job
> > is taking.
> >
> > Just for curiosity sake, want to know is there any other way possible to
> do
> > this, without removing datanodes.
> > I am afraid, if I remove datanodes, I can loose some data blocks that
> reside
> > on those machines as I have some files with replication = 1 ?
> >
> > Thanks,
> > Praveenesh
> >
>
>
>
> --
> Harsh J
>

Reply via email to