Hi Brad

This is a really interesting experiment. I am curious why you did not use 2
cores each machine but 32 nodes. That makes the number of CPU core in two
groups equal.

Chen

On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield <b...@bing.com> wrote:

> Hi Prashant,
>
> In each case I had a single tasktracker per node. I oversubscribed the
> total tasks per tasktracker/node by 1.5 x # of cores.
>
> So for the 64 core allocation comparison.
>        In A: 8 cores; Each machine had a single tasktracker with 8 maps /
> 4 reduce slots for 12 task slots total per machine x 8 machines (including
> head node)
>        In B: 2 c       ores; Each machine had a single tasktracker with 2
> maps / 1 reduce slots for 3 slots total per machines x 29 machines
> (including head node which was running 8 cores)
>
> The experiment was done in a cloud hosted environment running set of VMs.
>
> ~Brad
>
> -----Original Message-----
> From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> Sent: Tuesday, December 13, 2011 9:46 AM
> To: common-user@hadoop.apache.org
> Subject: Re: More cores Vs More Nodes ?
>
> Hi Brad, how many taskstrackers did you have on each node in both cases?
>
> Thanks,
> Prashant
>
> Sent from my iPhone
>
> On Dec 13, 2011, at 9:42 AM, Brad Sarsfield <b...@bing.com> wrote:
>
> > Praveenesh,
> >
> > Your question is not naïve; in fact, optimal hardware design can
> ultimately be a very difficult question to answer on what would be
> "better". If you made me pick one without much information I'd go for more
> machines.  But...
> >
> > It all depends; and there is no right answer.... :)
> >
> > More machines
> >    +May run your workload faster
> >    +Will give you a higher degree of reliability protection from node /
> hardware / hard drive failure.
> >    +More aggregate IO capabilities
> >    - capex / opex may be higher than allocating more cores More cores
> >    +May run your workload faster
> >    +More cores may allow for more tasks to run on the same machine
> >    +More cores/tasks may reduce network contention and increase
> increasing task to task data flow performance.
> >
> > Notice "May run your workload faster" is in both; as it can be very
> workload dependant.
> >
> > My Experience:
> > I did a recent experiment and found that given the same number of cores
> (64) with the exact same network / machine configuration;
> >    A: I had 8 machines with 8 cores
> >    B: I had 28 machines with 2 cores (and 1x8 core head node)
> >
> > B was able to outperform A by 2x using teragen and terasort. These
> machines were running in a virtualized environment; where some of the IO
> capabilities behind the scenes were being regulated to 400Mbps per node
> when running in the 2 core configuration vs 1Gbps on the 8 core.  So I
> would expect the non-throttled scenario to work even better.
> >
> > ~Brad
> >
> >
> > -----Original Message-----
> > From: praveenesh kumar [mailto:praveen...@gmail.com]
> > Sent: Monday, December 12, 2011 8:51 PM
> > To: common-user@hadoop.apache.org
> > Subject: More cores Vs More Nodes ?
> >
> > Hey Guys,
> >
> > So I have a very naive question in my mind regarding Hadoop cluster
> nodes ?
> >
> > more cores or more nodes - Shall I spend money on going from 2-4 core
> machines, or spend money on buying more nodes less core eg. say 2 machines
> of 2 cores for example?
> >
> > Thanks,
> > Praveenesh
> >
>
>

Reply via email to