Oh wow.. I didn't know that.. Actually for me datanodes/tasktrackers are running on same machines. I mention datanodes because if I delete those machines from masters list, chances are the data will also loose. So I don't want to do that.. but now I guess by stoping tasktrackers individually... I can decrease the strength of my cluster by decreasing the number of nodes that will run tasktracker .. right ?? This way I won't loose my data also.. Right ??
On Wed, Sep 21, 2011 at 6:39 PM, Harsh J <ha...@cloudera.com> wrote: > Praveenesh, > > TaskTrackers run your jobs' tasks for you, not DataNodes directly. So > you can statically control loads on nodes by removing away > TaskTrackers from your cluster. > > i.e, if you "service hadoop-0.20-tasktracker stop" or > "hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't > run there anymore. > > Is this what you're looking for? > > (There are ways to achieve the exclusion dynamically, by writing a > scheduler, but hard to tell without knowing what you need > specifically, and why do you require it?) > > On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar <praveen...@gmail.com> > wrote: > > Is there any way that we can run a particular job in a hadoop on subset > of > > datanodes ? > > > > My problem is I don't want to use all the nodes to run some job, > > I am trying to make Job completion Vs No. of nodes graph for a particular > > job. > > One way to do is I can remove datanodes, and then see how much time the > job > > is taking. > > > > Just for curiosity sake, want to know is there any other way possible to > do > > this, without removing datanodes. > > I am afraid, if I remove datanodes, I can loose some data blocks that > reside > > on those machines as I have some files with replication = 1 ? > > > > Thanks, > > Praveenesh > > > > > > -- > Harsh J >