It also helps to know the profile of your job in how you spec the 
machines. So in addition to Brad's response you should consider if you 
think your jobs will be more storage or compute oriented. 

------------------------------------------------
Tom Deutsch
Program Director
Information Management
Big Data Technologies
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeut...@us.ibm.com




Brad Sarsfield <b...@bing.com> 
12/13/2011 09:41 AM
Please respond to
common-user@hadoop.apache.org


To
"common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
cc

Subject
RE: More cores Vs More Nodes ?






Praveenesh,

Your question is not naïve; in fact, optimal hardware design can 
ultimately be a very difficult question to answer on what would be 
"better". If you made me pick one without much information I'd go for more 
machines.  But...

It all depends; and there is no right answer.... :) 

More machines 
                 +May run your workload faster
                 +Will give you a higher degree of reliability protection 
from node / hardware / hard drive failure.
                 +More aggregate IO capabilities
                 - capex / opex may be higher than allocating more cores
More cores 
                 +May run your workload faster
                 +More cores may allow for more tasks to run on the same 
machine
                 +More cores/tasks may reduce network contention and 
increase increasing task to task data flow performance.

Notice "May run your workload faster" is in both; as it can be very 
workload dependant.

My Experience:
I did a recent experiment and found that given the same number of cores 
(64) with the exact same network / machine configuration; 
                 A: I had 8 machines with 8 cores 
                 B: I had 28 machines with 2 cores (and 1x8 core head 
node)

B was able to outperform A by 2x using teragen and terasort. These 
machines were running in a virtualized environment; where some of the IO 
capabilities behind the scenes were being regulated to 400Mbps per node 
when running in the 2 core configuration vs 1Gbps on the 8 core.  So I 
would expect the non-throttled scenario to work even better. 

~Brad


-----Original Message-----
From: praveenesh kumar [mailto:praveen...@gmail.com] 
Sent: Monday, December 12, 2011 8:51 PM
To: common-user@hadoop.apache.org
Subject: More cores Vs More Nodes ?

Hey Guys,

So I have a very naive question in my mind regarding Hadoop cluster nodes 
?

more cores or more nodes - Shall I spend money on going from 2-4 core 
machines, or spend money on buying more nodes less core eg. say 2 machines 
of 2 cores for example?

Thanks,
Praveenesh


Reply via email to