Hi Russell, We will be in production soon with both OS virtualized Hadoop deployments along with existing bare metal deployments.
We are finding tradeoffs on both sides. On the virtualization side; cluster elasticity and deployment times are easier. Speed of node recovery can be a faster with VM image restore. VM migration from one server to another makes planned hardware upgrades/repairs easier. But there's always the virtualization overhead/tax to pay along with what can be a set of multi-vm or multi-tenancy overhead. I have been thinking about experimenting with a topology/rack level awareness scheme where one would map physical VM hosts to the VM's Hadoop instance rack affinity nesting level. ~Brad -----Original Message----- From: Russell Jurney [mailto:russell.jur...@gmail.com] Sent: Wednesday, December 14, 2011 1:27 PM To: common-user@hadoop.apache.org Subject: Re: More cores Vs More Nodes ? You're using OS virtualization in your test. Are you using it in production? Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com On Dec 13, 2011, at 5:16 PM, Brad Sarsfield <b...@bing.com> wrote: > The experiment was done in a cloud hosted environment running set of VMs. > > ~Brad > > -----Original Message----- > From: Prashant Kommireddi [mailto:prash1...@gmail.com] > Sent: Tuesday, December 13, 2011 9:46 AM > To: common-user@hadoop.apache.org > Subject: Re: More cores Vs More Nodes ? > > Hi Brad, how many taskstrackers did you have on each node in both cases? > > Thanks, > Prashant > > Sent from my iPhone > > On Dec 13, 2011, at 9:42 AM, Brad Sarsfield <b...@bing.com> wrote: > >> Praveenesh, >> >> Your question is not naïve; in fact, optimal hardware design can ultimately >> be a very difficult question to answer on what would be "better". If you >> made me pick one without much information I'd go for more machines. But... >> >> It all depends; and there is no right answer.... :) >> >> More machines >> +May run your workload faster >> +Will give you a higher degree of reliability protection from node / >> hardware / hard drive failure. >> +More aggregate IO capabilities >> - capex / opex may be higher than allocating more cores More cores >> +May run your workload faster +More cores may allow for more tasks >> to run on the same machine +More cores/tasks may reduce network >> contention and increase increasing task to task data flow performance. >> >> Notice "May run your workload faster" is in both; as it can be very workload >> dependant. >> >> My Experience: >> I did a recent experiment and found that given the same number of >> cores (64) with the exact same network / machine configuration; >> A: I had 8 machines with 8 cores >> B: I had 28 machines with 2 cores (and 1x8 core head node) >> >> B was able to outperform A by 2x using teragen and terasort. These machines >> were running in a virtualized environment; where some of the IO capabilities >> behind the scenes were being regulated to 400Mbps per node when running in >> the 2 core configuration vs 1Gbps on the 8 core. So I would expect the >> non-throttled scenario to work even better. >> >> ~Brad >> >> >> -----Original Message----- >> From: praveenesh kumar [mailto:praveen...@gmail.com] >> Sent: Monday, December 12, 2011 8:51 PM >> To: common-user@hadoop.apache.org >> Subject: More cores Vs More Nodes ? >> >> Hey Guys, >> >> So I have a very naive question in my mind regarding Hadoop cluster nodes ? >> >> more cores or more nodes - Shall I spend money on going from 2-4 core >> machines, or spend money on buying more nodes less core eg. say 2 machines >> of 2 cores for example? >> >> Thanks, >> Praveenesh >> >