Re: NameNode heapsize
On Jun 13, 2011, at 5:52 AM, Steve Loughran wrote: > > Unless your cluster is bigger than Facebooks, you have too many small files > > +1 (I'm actually sort of surprised the NN is still standing with only 24mb. The gc logs would be interesting to look at.) I'd also likely increase the block size, distcp files to the new block size, and then replace the old files with the new files.
Re: NameNode heapsize
On 06/10/2011 05:31 PM, si...@ugcv.com wrote: I would add more RAM for sure but there's hardware limitation. How if the motherboard couldn't support more than ... say 128GB ? seems I can't keep adding RAM to resolve it. compressed pointers, do u mean turning on jvm compressed reference ? I didn't try that out before, how's your experience ? JVMs top out at 64GB, I think, while compressed pointers only work on sun VMs when the heap is under 32GB. JRockit has better heap management, but as I was the only person to admit to using Hadoop on JRockit, I know you'd be on your own if you found problems there. Unless your cluster is bigger than Facebooks, you have too many small files
Re: NameNode heapsize
On 10/06/2011 10:00 PM, Edward Capriolo wrote: On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelmanwrote: On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: Dear all, I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores around 30 million files. Here's some info: 1 x namenode: 32GB RAM, 24GB heap size 800 x datanode: 8GB RAM, 13TB hdd *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size is 22.93 GB / 22.93 GB (100%) * From the cluster summary report, it seems the heap size usage is always full but couldn't drop, do you guys know of any ways to reduce it ? So far I don't see any namenode OOM errors so it looks memory assigned for the namenode process is (just) enough. But i'm curious which factors would account for the full use of heap size ? The advice I give to folks is to plan on 1GB heap for every million objects. It's an over-estimate, but I prefer to be on the safe side. Why not increase the heap-size to 28GB? Should buy you some time. You can turn on compressed pointers, but your best bet is really going to be spending some more money on RAM. Brian The problem with the "buy more RAM" philosophy is the JVM's tend to have problems operating without pausing for large heaps. NameNode JVM pausing is not a good thing. Number of Files and Number of Blocks is important so larger block sizes help make for less NN memory usage. Also your setup nodes do not mention a secondary name node. Do you have one? It needs slightly more RAM then the NN. The NN starts up with 8GB, 16GB and currently 24GB, most likely raise it up to 28GB next month but looks closed to max I would add more RAM for sure but there's hardware limitation. How if the motherboard couldn't support more than ... say 128GB ? seems I can't keep adding RAM to resolve it. compressed pointers, do u mean turning on jvm compressed reference ? I didn't try that out before, how's your experience ? I'm running another secondary NN exactly same hardware spec with the NN. Both using 24GB heap size, supposedly enough to handle sync'ing/merging of namespace. If the secondary NN needs more RAM than NN, do you suggest adding more to NN as well ?
Re: NameNode heapsize
On Fri, Jun 10, 2011 at 8:22 AM, Brian Bockelman wrote: > > On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: > > > Dear all, > > > > I'm looking for ways to improve the namenode heap size usage of a > 800-node 10PB testing Hadoop cluster that stores > > around 30 million files. > > > > Here's some info: > > > > 1 x namenode: 32GB RAM, 24GB heap size > > 800 x datanode: 8GB RAM, 13TB hdd > > > > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap > Size is 22.93 GB / 22.93 GB (100%) * > > > > From the cluster summary report, it seems the heap size usage is always > full but couldn't drop, do you guys know of any ways > > to reduce it ? So far I don't see any namenode OOM errors so it looks > memory assigned for the namenode process is (just) > > enough. But i'm curious which factors would account for the full use of > heap size ? > > > > The advice I give to folks is to plan on 1GB heap for every million > objects. It's an over-estimate, but I prefer to be on the safe side. Why > not increase the heap-size to 28GB? Should buy you some time. > > You can turn on compressed pointers, but your best bet is really going to > be spending some more money on RAM. > > Brian The problem with the "buy more RAM" philosophy is the JVM's tend to have problems operating without pausing for large heaps. NameNode JVM pausing is not a good thing. Number of Files and Number of Blocks is important so larger block sizes help make for less NN memory usage. Also your setup nodes do not mention a secondary name node. Do you have one? It needs slightly more RAM then the NN.
Re: NameNode heapsize
On Jun 10, 2011, at 6:32 AM, si...@ugcv.com wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores > around 30 million files. > > Here's some info: > > 1 x namenode: 32GB RAM, 24GB heap size > 800 x datanode: 8GB RAM, 13TB hdd > > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size > is 22.93 GB / 22.93 GB (100%) * > > From the cluster summary report, it seems the heap size usage is always full > but couldn't drop, do you guys know of any ways > to reduce it ? So far I don't see any namenode OOM errors so it looks memory > assigned for the namenode process is (just) > enough. But i'm curious which factors would account for the full use of heap > size ? > The advice I give to folks is to plan on 1GB heap for every million objects. It's an over-estimate, but I prefer to be on the safe side. Why not increase the heap-size to 28GB? Should buy you some time. You can turn on compressed pointers, but your best bet is really going to be spending some more money on RAM. Brian
Re: NameNode heapsize
Are you using compressed pointers? Sent from my mobile. Please excuse the typos. On 2011-06-10, at 5:33 AM, "si...@ugcv.com" wrote: > Dear all, > > I'm looking for ways to improve the namenode heap size usage of a 800-node > 10PB testing Hadoop cluster that stores > around 30 million files. > > Here's some info: > > 1 x namenode: 32GB RAM, 24GB heap size > 800 x datanode: 8GB RAM, 13TB hdd > > *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size > is 22.93 GB / 22.93 GB (100%) * > > From the cluster summary report, it seems the heap size usage is always full > but couldn't drop, do you guys know of any ways > to reduce it ? So far I don't see any namenode OOM errors so it looks memory > assigned for the namenode process is (just) > enough. But i'm curious which factors would account for the full use of heap > size ? > > Regards, > On
NameNode heapsize
Dear all, I'm looking for ways to improve the namenode heap size usage of a 800-node 10PB testing Hadoop cluster that stores around 30 million files. Here's some info: 1 x namenode: 32GB RAM, 24GB heap size 800 x datanode: 8GB RAM, 13TB hdd *33050825 files and directories, 47708724 blocks = 80759549 total. Heap Size is 22.93 GB / 22.93 GB (100%) * From the cluster summary report, it seems the heap size usage is always full but couldn't drop, do you guys know of any ways to reduce it ? So far I don't see any namenode OOM errors so it looks memory assigned for the namenode process is (just) enough. But i'm curious which factors would account for the full use of heap size ? Regards, On