I will give this a go. I have actually went in JMX and manually triggered
GC no memory is returned. So I assumed something was leaking.

On Fri, Dec 21, 2012 at 11:59 PM, Adam Faris <afa...@linkedin.com> wrote:

> I know this will sound odd, but try reducing your heap size.   We had an
> issue like this where GC kept falling behind and we either ran out of heap
> or would be in full gc.  By reducing heap, we were forcing concurrent mark
> sweep to occur and avoided both full GC and running out of heap space as
> the JVM would collect objects more frequently.
>
> On Dec 21, 2012, at 8:24 PM, Edward Capriolo <edlinuxg...@gmail.com>
> wrote:
>
> > I have an old hadoop 0.20.2 cluster. Have not had any issues for a while.
> > (which is why I never bothered an upgrade)
> >
> > Suddenly it OOMed last week. Now the OOMs happen periodically. We have a
> > fairly large NameNode heap Xmx 17GB. It is a fairly large FS about
> > 27,000,000 files.
> >
> > So the strangest thing is that every 1 and 1/2 hour the NN memory usage
> > increases until the heap is full.
> >
> > http://imagebin.org/240287
> >
> > We tried failing over the NN to another machine. We change the Java
> version
> > from 1.6_23 -> 1.7.0.
> >
> > I have set the NameNode logs to debug and ALL and I have done the same
> with
> > the data nodes.
> > Secondary NN is running and shipping edits and making new images.
> >
> > I am thinking something has corrupted the NN MetaData and after enough
> time
> > it becomes a time bomb, but this is just a total shot in the dark. Does
> > anyone have any interesting trouble shooting ideas?
>
>

Reply via email to