On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelman <bbock...@cse.unl.edu>wrote:

>
> On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote:
>
> > Dear all,
> >
> > I'm looking for ways to improve the namenode heap size usage of a
> 800-node 10PB testing Hadoop cluster that stores
> > around 30 million files.
> >
> > Here's some info:
> >
> > 1 x namenode:     32GB RAM, 24GB heap size
> > 800 x datanode:   8GB RAM, 13TB hdd
> >
> > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap
> Size is 22.93 GB / 22.93 GB (100%) *
> >
> > From the cluster summary report, it seems the heap size usage is always
> full but couldn't drop, do you guys know of any ways
> > to reduce it ? So far I don't see any namenode OOM errors so it looks
> memory assigned for the namenode process is (just)
> > enough. But i'm curious which factors would account for the full use of
> heap size ?
> >
>
> The advice I give to folks is to plan on 1GB heap for every million
> objects.  It's an over-estimate, but I prefer to be on the safe side.  Why
> not increase the heap-size to 28GB?  Should buy you some time.
>
> You can turn on compressed pointers, but your best bet is really going to
> be spending some more money on RAM.
>
> Brian


The problem with the "buy more RAM" philosophy is the JVM's tend to have
problems operating without pausing for large heaps. NameNode JVM pausing is
not a good thing. Number of Files and Number of Blocks is important so
larger block sizes help make for less NN memory usage.

Also your setup nodes do not mention a secondary name node. Do you have one?
It needs slightly more RAM then the NN.

Reply via email to