Praveen,

If you are doing performance measurements be aware that having more datanodes 
then tasktrackers will impact the performance as well (Don't really know for 
sure how).  It will not be the same performance as running on a cluster with 
just fewer nodes over all.  Also if you do shut off datanodes as well as task 
trackers you will need to give the cluster a while for re-replication to finish 
before you try to run your performance numbers.

--Bobby Evans


On 9/21/11 8:27 AM, "Harsh J" <ha...@cloudera.com> wrote:

Praveenesh,

Absolutely right. Just stop them individually :)

On Wed, Sep 21, 2011 at 6:53 PM, praveenesh kumar <praveen...@gmail.com> wrote:
> Oh wow.. I didn't know that..
> Actually for me datanodes/tasktrackers are running on same machines.
> I mention datanodes because if I delete those machines from masters list,
> chances are the data will also loose.
> So I don't want to do that..
> but now I guess by stoping tasktrackers individually... I can decrease the
> strength of my cluster by decreasing the number of nodes that will run
> tasktracker .. right ?? This  way I won't loose my data also.. Right ??
>
>
>
> On Wed, Sep 21, 2011 at 6:39 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Praveenesh,
>>
>> TaskTrackers run your jobs' tasks for you, not DataNodes directly. So
>> you can statically control loads on nodes by removing away
>> TaskTrackers from your cluster.
>>
>> i.e, if you "service hadoop-0.20-tasktracker stop" or
>> "hadoop-daemon.sh stop tasktracker" on the specific nodes, jobs won't
>> run there anymore.
>>
>> Is this what you're looking for?
>>
>> (There are ways to achieve the exclusion dynamically, by writing a
>> scheduler, but hard to tell without knowing what you need
>> specifically, and why do you require it?)
>>
>> On Wed, Sep 21, 2011 at 6:32 PM, praveenesh kumar <praveen...@gmail.com>
>> wrote:
>> > Is there any way that we can run a particular job in a hadoop on subset
>> of
>> > datanodes ?
>> >
>> > My problem is I don't want to use all the nodes to run some job,
>> > I am trying to make Job completion Vs No. of nodes graph for a particular
>> > job.
>> > One way to do is I can remove datanodes, and then see how much time the
>> job
>> > is taking.
>> >
>> > Just for curiosity sake, want to know is there any other way possible to
>> do
>> > this, without removing datanodes.
>> > I am afraid, if I remove datanodes, I can loose some data blocks that
>> reside
>> > on those machines as I have some files with replication = 1 ?
>> >
>> > Thanks,
>> > Praveenesh
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>



--
Harsh J

Reply via email to