Re: NameNode heapsize

2011-06-13 Thread Allen Wittenauer

On Jun 13, 2011, at 5:52 AM, Steve Loughran wrote:
> 
> Unless your cluster is bigger than Facebooks, you have too many small files
> 
> 


+1 

(I'm actually sort of surprised the NN is still standing with only 24mb.  The 
gc logs would be interesting to look at.)

I'd also likely increase the block size, distcp files to the new block size, 
and then replace the old files with the new files.




Re: NameNode heapsize

2011-06-13 Thread Steve Loughran

On 06/10/2011 05:31 PM, si...@ugcv.com wrote:



I would add more RAM for sure but there's hardware limitation. How if
the motherboard
couldn't support more than ... say 128GB ? seems I can't keep adding RAM
to resolve it.

compressed pointers, do u mean turning on jvm compressed reference ?
I didn't try that out before, how's your experience ?


JVMs top out at 64GB, I think, while compressed pointers only work on 
sun VMs when the heap is under 32GB. JRockit has better heap management, 
but as I was the only person to admit to using Hadoop on JRockit, I know 
you'd be on your own if you found problems there.


Unless your cluster is bigger than Facebooks, you have too many small files




Re: NameNode heapsize

2011-06-10 Thread si...@ugcv.com

On 10/06/2011 10:00 PM, Edward Capriolo wrote:

On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelmanwrote:


On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote:


Dear all,

I'm looking for ways to improve the namenode heap size usage of a

800-node 10PB testing Hadoop cluster that stores

around 30 million files.

Here's some info:

1 x namenode: 32GB RAM, 24GB heap size
800 x datanode:   8GB RAM, 13TB hdd

*33050825 files and directories, 47708724 blocks = 80759549 total. Heap

Size is 22.93 GB / 22.93 GB (100%) *

 From the cluster summary report, it seems the heap size usage is always

full but couldn't drop, do you guys know of any ways

to reduce it ? So far I don't see any namenode OOM errors so it looks

memory assigned for the namenode process is (just)

enough. But i'm curious which factors would account for the full use of

heap size ?
The advice I give to folks is to plan on 1GB heap for every million
objects.  It's an over-estimate, but I prefer to be on the safe side.  Why
not increase the heap-size to 28GB?  Should buy you some time.

You can turn on compressed pointers, but your best bet is really going to
be spending some more money on RAM.

Brian


The problem with the "buy more RAM" philosophy is the JVM's tend to have
problems operating without pausing for large heaps. NameNode JVM pausing is
not a good thing. Number of Files and Number of Blocks is important so
larger block sizes help make for less NN memory usage.

Also your setup nodes do not mention a secondary name node. Do you have one?
It needs slightly more RAM then the NN.



The NN starts up with 8GB, 16GB and currently 24GB, most likely raise it 
up to 28GB

next month but looks closed to max

I would add more RAM for sure but there's hardware limitation. How if 
the motherboard
couldn't support more than ... say 128GB ? seems I can't keep adding RAM 
to resolve it.


compressed pointers, do u mean turning on jvm compressed reference ?
I didn't try that out before, how's your experience ?

I'm running another secondary NN exactly same hardware spec with the NN.
Both using 24GB heap size, supposedly enough to handle sync'ing/merging 
of namespace.
If the secondary NN needs more RAM than NN, do you suggest adding more 
to NN as well ?




Re: NameNode heapsize

2011-06-10 Thread Edward Capriolo
On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelman wrote:

>
> On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote:
>
> > Dear all,
> >
> > I'm looking for ways to improve the namenode heap size usage of a
> 800-node 10PB testing Hadoop cluster that stores
> > around 30 million files.
> >
> > Here's some info:
> >
> > 1 x namenode: 32GB RAM, 24GB heap size
> > 800 x datanode:   8GB RAM, 13TB hdd
> >
> > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap
> Size is 22.93 GB / 22.93 GB (100%) *
> >
> > From the cluster summary report, it seems the heap size usage is always
> full but couldn't drop, do you guys know of any ways
> > to reduce it ? So far I don't see any namenode OOM errors so it looks
> memory assigned for the namenode process is (just)
> > enough. But i'm curious which factors would account for the full use of
> heap size ?
> >
>
> The advice I give to folks is to plan on 1GB heap for every million
> objects.  It's an over-estimate, but I prefer to be on the safe side.  Why
> not increase the heap-size to 28GB?  Should buy you some time.
>
> You can turn on compressed pointers, but your best bet is really going to
> be spending some more money on RAM.
>
> Brian


The problem with the "buy more RAM" philosophy is the JVM's tend to have
problems operating without pausing for large heaps. NameNode JVM pausing is
not a good thing. Number of Files and Number of Blocks is important so
larger block sizes help make for less NN memory usage.

Also your setup nodes do not mention a secondary name node. Do you have one?
It needs slightly more RAM then the NN.


Re: NameNode heapsize

2011-06-10 Thread Brian Bockelman

On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote:

> Dear all,
> 
> I'm looking for ways to improve the namenode heap size usage of a 800-node 
> 10PB testing Hadoop cluster that stores
> around 30 million files.
> 
> Here's some info:
> 
> 1 x namenode: 32GB RAM, 24GB heap size
> 800 x datanode:   8GB RAM, 13TB hdd
> 
> *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size 
> is 22.93 GB / 22.93 GB (100%) *
> 
> From the cluster summary report, it seems the heap size usage is always full 
> but couldn't drop, do you guys know of any ways
> to reduce it ? So far I don't see any namenode OOM errors so it looks memory 
> assigned for the namenode process is (just)
> enough. But i'm curious which factors would account for the full use of heap 
> size ?
> 

The advice I give to folks is to plan on 1GB heap for every million objects.  
It's an over-estimate, but I prefer to be on the safe side.  Why not increase 
the heap-size to 28GB?  Should buy you some time.

You can turn on compressed pointers, but your best bet is really going to be 
spending some more money on RAM.

Brian

Re: NameNode heapsize

2011-06-10 Thread James Seigel
Are you using compressed pointers?

Sent from my mobile. Please excuse the typos.

On 2011-06-10, at 5:33 AM, "si...@ugcv.com"  wrote:

> Dear all,
>
> I'm looking for ways to improve the namenode heap size usage of a 800-node 
> 10PB testing Hadoop cluster that stores
> around 30 million files.
>
> Here's some info:
>
> 1 x namenode: 32GB RAM, 24GB heap size
> 800 x datanode:   8GB RAM, 13TB hdd
>
> *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size 
> is 22.93 GB / 22.93 GB (100%) *
>
> From the cluster summary report, it seems the heap size usage is always full 
> but couldn't drop, do you guys know of any ways
> to reduce it ? So far I don't see any namenode OOM errors so it looks memory 
> assigned for the namenode process is (just)
> enough. But i'm curious which factors would account for the full use of heap 
> size ?
>
> Regards,
> On


NameNode heapsize

2011-06-10 Thread si...@ugcv.com

Dear all,

I'm looking for ways to improve the namenode heap size usage of a 
800-node 10PB testing Hadoop cluster that stores

around 30 million files.

Here's some info:

1 x namenode: 32GB RAM, 24GB heap size
800 x datanode:   8GB RAM, 13TB hdd

*33050825 files and directories, 47708724 blocks = 80759549 total. Heap 
Size is 22.93 GB / 22.93 GB (100%) *


From the cluster summary report, it seems the heap size usage is always 
full but couldn't drop, do you guys know of any ways
to reduce it ? So far I don't see any namenode OOM errors so it looks 
memory assigned for the namenode process is (just)
enough. But i'm curious which factors would account for the full use of 
heap size ?


Regards,
On