Sorry,
But having read the thread, I am going to have to say that this is definitely a 
silly question.
NOTE THE FOLLOWING: Silly questions are not a bad thing. I happen to ask them 
all the time. ;-)

Here's why I say its a silly question...

Hadoop is a cost effective solution when you build out 'commodity' servers. 
Now here's the rub. 
Commodity servers means something different to each person, and I don't want to 
get in to a debate on its definition.

When building out a cluster, too many people gloss over the complexity. 1U vs 
2U in box size. Do you 1/2 MB or full size MB. How many disks per node. How 
much memory. Physical plant limitations. (Available rack space, costs if this 
is going in to a colo...) Power consumption, budget...

At a client, back in 2009, our first cluster was build on whatever hardware we 
could get. It was 5 blade servers w SCSI/SAS 2.5" disks where we split each 
blade so we could have 10 nodes. Yeah, it was a mistake and a royal pain. But 
we got the cluster up and could do some simple PoCs. But we then came up with 
our reference architecture for further PoCs and development. 
We build out the DN w 8 core, 32GB, and 4 x 2TB 3.5" drives. Why? Because based 
on our constraints, this gave us the optimal  combination w price and 
performance. Note: We knew we would leave some performance on the table. It was 
a conscious decision to leave some performance on the table so that we could 
maximize the number of nodes to fit within out budget.

We chose 2TB drives because at the time they offered the best price/performance 
ratio. Today, that may be different.
We chose 32GB because at the time it was the sweet spot in memory prices. Today 
w 3 channel memory it looks like 36GB is the sweet spot. Of course YMMV. (It 
could be 48GB...)

Moving forward, I would reconsider the design because the price points on 
hardware has changed. 

That's going to be your driving factor. 

You want to look at 64 Core boxes, then you need 256GB of memory. Think of how 
many disks you have to add. (64-128 disks)
Now then ask yourself is this a commodity box?

Now price that box out.
Then price out how many 8 core 1U boxes you can buy.

Kind of puts it in to perspective, doesn't it? ;-)

The reason why I call this a 'silly question' is that you're attempting to look 
at your cluster by focusing on only one variable. 
This is not to say that its a bad question because it forces you to realize 
that there are definitely lots of other options.  that you have to consider.

HTH

-Mike
 

> Date: Tue, 13 Dec 2011 20:25:17 -0600
> Subject: Re: More cores Vs More Nodes ?
> From: airb...@gmail.com
> To: common-user@hadoop.apache.org
> 
> Hi Brad
> 
> This is a really interesting experiment. I am curious why you did not use 2
> cores each machine but 32 nodes. That makes the number of CPU core in two
> groups equal.
> 
> Chen
> 
> On Tue, Dec 13, 2011 at 7:15 PM, Brad Sarsfield <b...@bing.com> wrote:
> 
> > Hi Prashant,
> >
> > In each case I had a single tasktracker per node. I oversubscribed the
> > total tasks per tasktracker/node by 1.5 x # of cores.
> >
> > So for the 64 core allocation comparison.
> >        In A: 8 cores; Each machine had a single tasktracker with 8 maps /
> > 4 reduce slots for 12 task slots total per machine x 8 machines (including
> > head node)
> >        In B: 2 c       ores; Each machine had a single tasktracker with 2
> > maps / 1 reduce slots for 3 slots total per machines x 29 machines
> > (including head node which was running 8 cores)
> >
> > The experiment was done in a cloud hosted environment running set of VMs.
> >
> > ~Brad
> >
> > -----Original Message-----
> > From: Prashant Kommireddi [mailto:prash1...@gmail.com]
> > Sent: Tuesday, December 13, 2011 9:46 AM
> > To: common-user@hadoop.apache.org
> > Subject: Re: More cores Vs More Nodes ?
> >
> > Hi Brad, how many taskstrackers did you have on each node in both cases?
> >
> > Thanks,
> > Prashant
> >
> > Sent from my iPhone
> >
> > On Dec 13, 2011, at 9:42 AM, Brad Sarsfield <b...@bing.com> wrote:
> >
> > > Praveenesh,
> > >
> > > Your question is not naïve; in fact, optimal hardware design can
> > ultimately be a very difficult question to answer on what would be
> > "better". If you made me pick one without much information I'd go for more
> > machines.  But...
> > >
> > > It all depends; and there is no right answer.... :)
> > >
> > > More machines
> > >    +May run your workload faster
> > >    +Will give you a higher degree of reliability protection from node /
> > hardware / hard drive failure.
> > >    +More aggregate IO capabilities
> > >    - capex / opex may be higher than allocating more cores More cores
> > >    +May run your workload faster
> > >    +More cores may allow for more tasks to run on the same machine
> > >    +More cores/tasks may reduce network contention and increase
> > increasing task to task data flow performance.
> > >
> > > Notice "May run your workload faster" is in both; as it can be very
> > workload dependant.
> > >
> > > My Experience:
> > > I did a recent experiment and found that given the same number of cores
> > (64) with the exact same network / machine configuration;
> > >    A: I had 8 machines with 8 cores
> > >    B: I had 28 machines with 2 cores (and 1x8 core head node)
> > >
> > > B was able to outperform A by 2x using teragen and terasort. These
> > machines were running in a virtualized environment; where some of the IO
> > capabilities behind the scenes were being regulated to 400Mbps per node
> > when running in the 2 core configuration vs 1Gbps on the 8 core.  So I
> > would expect the non-throttled scenario to work even better.
> > >
> > > ~Brad
> > >
> > >
> > > -----Original Message-----
> > > From: praveenesh kumar [mailto:praveen...@gmail.com]
> > > Sent: Monday, December 12, 2011 8:51 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: More cores Vs More Nodes ?
> > >
> > > Hey Guys,
> > >
> > > So I have a very naive question in my mind regarding Hadoop cluster
> > nodes ?
> > >
> > > more cores or more nodes - Shall I spend money on going from 2-4 core
> > machines, or spend money on buying more nodes less core eg. say 2 machines
> > of 2 cores for example?
> > >
> > > Thanks,
> > > Praveenesh
> > >
> >
> >
                                          

Reply via email to