Re: OOME hell

stack Mon, 01 Dec 2008 11:38:12 -0800

Andrew Purtell wrote:

I am constantly needing to restart my cluster now, even running region servers 
with 3GB of heap. The production cluster is running Hadoop 0.18.1 and HBase 
0.18.1


I will see mapred tasks fail with (copied by hand, please forgive):

java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at java.io.DataInputStream.readFull(DataInputSteram.java:175)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)

Can you see which store file this is happening against? Does it alwaysOOME against same storefile? Does it always OOME in same place? Do youthink these cells wholesome? Not extremely large? (thought is thatthere might be a corrupted record that manifests itself as a very largerecord and we OOME trying to read it in to memory to shuttle across tothe client). I can make a mapfile checker for you if you'd like -- justsay.

...

This problem is really killing us. When the OOMEs happen, the cluster does not 
recover without manual intervention. The regionservers sometimes go down after 
this, or sometimes do not and stay up in sick condition for a while. Regions go 
offline and remain unavailable, causing indefinite stalls all over the place.

Is this because the OOMEs are bubbling up in a place that doesn't runthe release of resevoir memory and trigger proper node shutdown? Shouldwe backport hbase-1020/hbase-1006?

Even so, my workload is modest continuous write operations, maybe up to 100/sec, of objects typically < 4K in size but can be as large as 20MB. Writes happen to both a 'urls' table and a 'content' table. 'content' table gets the raw content and uses RECORD compression.

I have no experience using compression in HStoreFiles. Runningcompression buffers may introduce a new uncertainty regards memorymanagement (Just guessing -- I have not looked). Have you tried withcompression disabled? Or, is it that you cannot disable compressiononce enabled.

'urls' table gets metadata only. Concurrent with this are two mapred tasks, one running on the 'urls' table, one on the 'content' table. The mapred tasks run once every few minutes for a few minutes, with a interval between executions currently at 5 minutes.Along with jgray's import problems,

These might be something other than OOME issues having spent some timestudying jgray cluster last wednesday (whole cluster went into swap;nothing was working -- GCs couldn't complete because of swapping. OOMEwas a symptom. Are you seeing any instances of HBASE-616 in your logsAndrew?).

I wonder if there is some issue with writes in general, or at least in my case, some interaction between the write side of things and the read side (caching, etc.) One thing I notice every so often is that if I stop the write load on the cluster then a few moments later a number of compactions and sometimes also splits start running as if they were being deferred.

There could be an issue here. I can look at log files if you put themin a place I can pull.

For a while I was doing funky things with store files but I have since reinitialized and am running with defaults for everything but blockcache (I use blocks of 8192).

You need blockcache? Blockcache uses soft references. Will fill untilmemory pressure and only then will it dump items. Might help if youdisable this.


What version of the JVM are you using?

St.Ack

Re: OOME hell

Reply via email to