Hi Brad This is a really interesting experiment. I am curious why you did not use 2 cores each machine but 32 nodes. That makes the number of CPU core in two groups equal.
Chen On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield <b...@bing.com> wrote: > Hi Prashant, > > In each case I had a single tasktracker per node. I oversubscribed the > total tasks per tasktracker/node by 1.5 x # of cores. > > So for the 64 core allocation comparison. > In A: 8 cores; Each machine had a single tasktracker with 8 maps / > 4 reduce slots for 12 task slots total per machine x 8 machines (including > head node) > In B: 2 c ores; Each machine had a single tasktracker with 2 > maps / 1 reduce slots for 3 slots total per machines x 29 machines > (including head node which was running 8 cores) > > The experiment was done in a cloud hosted environment running set of VMs. > > ~Brad > > -----Original Message----- > From: Prashant Kommireddi [mailto:prash1...@gmail.com] > Sent: Tuesday, December 13, 2011 9:46 AM > To: common-user@hadoop.apache.org > Subject: Re: More cores Vs More Nodes ? > > Hi Brad, how many taskstrackers did you have on each node in both cases? > > Thanks, > Prashant > > Sent from my iPhone > > On Dec 13, 2011, at 9:42 AM, Brad Sarsfield <b...@bing.com> wrote: > > > Praveenesh, > > > > Your question is not naïve; in fact, optimal hardware design can > ultimately be a very difficult question to answer on what would be > "better". If you made me pick one without much information I'd go for more > machines. But... > > > > It all depends; and there is no right answer.... :) > > > > More machines > > +May run your workload faster > > +Will give you a higher degree of reliability protection from node / > hardware / hard drive failure. > > +More aggregate IO capabilities > > - capex / opex may be higher than allocating more cores More cores > > +May run your workload faster > > +More cores may allow for more tasks to run on the same machine > > +More cores/tasks may reduce network contention and increase > increasing task to task data flow performance. > > > > Notice "May run your workload faster" is in both; as it can be very > workload dependant. > > > > My Experience: > > I did a recent experiment and found that given the same number of cores > (64) with the exact same network / machine configuration; > > A: I had 8 machines with 8 cores > > B: I had 28 machines with 2 cores (and 1x8 core head node) > > > > B was able to outperform A by 2x using teragen and terasort. These > machines were running in a virtualized environment; where some of the IO > capabilities behind the scenes were being regulated to 400Mbps per node > when running in the 2 core configuration vs 1Gbps on the 8 core. So I > would expect the non-throttled scenario to work even better. > > > > ~Brad > > > > > > -----Original Message----- > > From: praveenesh kumar [mailto:praveen...@gmail.com] > > Sent: Monday, December 12, 2011 8:51 PM > > To: common-user@hadoop.apache.org > > Subject: More cores Vs More Nodes ? > > > > Hey Guys, > > > > So I have a very naive question in my mind regarding Hadoop cluster > nodes ? > > > > more cores or more nodes - Shall I spend money on going from 2-4 core > machines, or spend money on buying more nodes less core eg. say 2 machines > of 2 cores for example? > > > > Thanks, > > Praveenesh > > > >