Hey there, I agree with Tom's response. One can decide it based on the type of jobs you run. I have been working on Hive and I realized that increasing no. of cores would give very good performance boost because joins and stuff are compute oriented and consume a lot of CPU on reduce side. This may not be the case with other applications (like HBase? )
Thanks So I feel that you shou On Tue, Dec 13, 2011 at 11:16 PM, Tom Deutsch <[email protected]> wrote: > It also helps to know the profile of your job in how you spec the > machines. So in addition to Brad's response you should consider if you > think your jobs will be more storage or compute oriented. > > ------------------------------------------------ > Tom Deutsch > Program Director > Information Management > Big Data Technologies > IBM > 3565 Harbor Blvd > Costa Mesa, CA 92626-1420 > [email protected] > > > > > Brad Sarsfield <[email protected]> > 12/13/2011 09:41 AM > Please respond to > [email protected] > > > To > "[email protected]" <[email protected]> > cc > > Subject > RE: More cores Vs More Nodes ? > > > > > > > Praveenesh, > > Your question is not naïve; in fact, optimal hardware design can > ultimately be a very difficult question to answer on what would be > "better". If you made me pick one without much information I'd go for more > machines. But... > > It all depends; and there is no right answer.... :) > > More machines > +May run your workload faster > +Will give you a higher degree of reliability protection > from node / hardware / hard drive failure. > +More aggregate IO capabilities > - capex / opex may be higher than allocating more cores > More cores > +May run your workload faster > +More cores may allow for more tasks to run on the same > machine > +More cores/tasks may reduce network contention and > increase increasing task to task data flow performance. > > Notice "May run your workload faster" is in both; as it can be very > workload dependant. > > My Experience: > I did a recent experiment and found that given the same number of cores > (64) with the exact same network / machine configuration; > A: I had 8 machines with 8 cores > B: I had 28 machines with 2 cores (and 1x8 core head > node) > > B was able to outperform A by 2x using teragen and terasort. These > machines were running in a virtualized environment; where some of the IO > capabilities behind the scenes were being regulated to 400Mbps per node > when running in the 2 core configuration vs 1Gbps on the 8 core. So I > would expect the non-throttled scenario to work even better. > > ~Brad > > > -----Original Message----- > From: praveenesh kumar [mailto:[email protected]] > Sent: Monday, December 12, 2011 8:51 PM > To: [email protected] > Subject: More cores Vs More Nodes ? > > Hey Guys, > > So I have a very naive question in my mind regarding Hadoop cluster nodes > ? > > more cores or more nodes - Shall I spend money on going from 2-4 core > machines, or spend money on buying more nodes less core eg. say 2 machines > of 2 cores for example? > > Thanks, > Praveenesh > > > -- Regards, Bharath .V w:http://researchweb.iiit.ac.in/~bharath.v
