Of course it all depends...
But something like this could work:

Leave 1-2 GB for the kernel, pagecache, tools, overhead etc.
Plan 3-4 GB for Datanode and Tasktracker each

Plan 2.5-3 GB per slot. Depending on the kinds of jobs, you may need more
or less memory per slot.
Have 2-3 times as many mappers as reducers (depending on the kinds of jobs
you run).

As Micheal pointed out the ratio of cores (hyperthreads) per disk matters.

With those initial rules of thumb you'd arrive somewhere between
10 mappers + 5 reducers
and
9 mappers + 4 reducers

Try, test, measure, adjust, rinse, repeat.

Cheers,

Joep

On Tue, Oct 2, 2012 at 8:42 PM, Alexander Pivovarov <apivova...@gmail.com>wrote:

> All configs are per node.
> No HBase, only Hive and Pig installed
>
> On Tue, Oct 2, 2012 at 9:40 PM, Michael Segel <michael_se...@hotmail.com
> >wrote:
>
> > I think he's saying that its 24 maps 8 reducers per node and at 48GB that
> > could be too many mappers.
> > Especially if they want to run HBase.
> >
> > On Oct 2, 2012, at 8:14 PM, hadoopman <hadoop...@gmail.com> wrote:
> >
> > > Only 24 map and 8 reduce tasks for 38 data nodes?  are you sure that's
> > right?  Sounds VERY low for a cluster that size.
> > >
> > > We have only 10 c2100's and are running I believe 140 map and 70 reduce
> > slots so far with pretty decent performance.
> > >
> > >
> > >
> > > On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:
> > >> 38 data nodes + 2 Name Nodes
> > >> >  >
> > >> >  >  Data Node:
> > >> >  >  Dell PowerEdge C2100 series
> > >> >  >  2 x XEON x5670
> > >> >  >  48 GB RAM ECC  (12x4GB 1333MHz)
> > >> >  >  12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
> > >> >  >  Intel Gigabit ET Dual port PCIe x4
> > >> >  >  Redundant Power Supply
> > >> >  >  Hadoop CDH3
> > >> >  >  max map tasks 24
> > >> >  >  max reduce tasks 8
> > >
> > >
> >
> >
>

Reply via email to