Praveenesh,

Your question is not naïve; in fact, optimal hardware design can ultimately be 
a very difficult question to answer on what would be "better". If you made me 
pick one without much information I'd go for more machines.  But...

It all depends; and there is no right answer.... :)   

More machines 
        +May run your workload faster
        +Will give you a higher degree of reliability protection from node / 
hardware / hard drive failure.
        +More aggregate IO capabilities
        - capex / opex may be higher than allocating more cores
More cores 
        +May run your workload faster
        +More cores may allow for more tasks to run on the same machine
        +More cores/tasks may reduce network contention and increase increasing 
task to task data flow performance.

Notice "May run your workload faster" is in both; as it can be very workload 
dependant.

My Experience:
I did a recent experiment and found that given the same number of cores (64) 
with the exact same network / machine configuration; 
        A: I had 8 machines with 8 cores 
        B: I had 28 machines with 2 cores (and 1x8 core head node)

B was able to outperform A by 2x using teragen and terasort. These machines 
were running in a virtualized environment; where some of the IO capabilities 
behind the scenes were being regulated to 400Mbps per node when running in the 
2 core configuration vs 1Gbps on the 8 core.  So I would expect the 
non-throttled scenario to work even better. 

~Brad


-----Original Message-----
From: praveenesh kumar [mailto:praveen...@gmail.com] 
Sent: Monday, December 12, 2011 8:51 PM
To: common-user@hadoop.apache.org
Subject: More cores Vs More Nodes ?

Hey Guys,

So I have a very naive question in my mind regarding Hadoop cluster nodes ?

more cores or more nodes - Shall I spend money on going from 2-4 core machines, 
or spend money on buying more nodes less core eg. say 2 machines of 2 cores for 
example?

Thanks,
Praveenesh

Reply via email to