I am constantly needing to restart my cluster now, even running region servers
with 3GB of heap. The production cluster is running Hadoop 0.18.1 and HBase
0.18.1
I will see mapred tasks fail with (copied by hand, please forgive):
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
at java.io.DataInputStream.readFull(DataInputSteram.java:175)
at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
...
This problem is really killing us. When the OOMEs happen, the cluster does not
recover without manual intervention. The regionservers sometimes go down after
this, or sometimes do not and stay up in sick condition for a while. Regions go
offline and remain unavailable, causing indefinite stalls all over the place.
Even so, my workload is modest continuous write operations, maybe up to
100/sec, of objects typically < 4K in size but can be as large as 20MB. Writes
happen to both a 'urls' table and a 'content' table. 'content' table gets the
raw content and uses RECORD compression. 'urls' table gets metadata only.
Concurrent with this are two mapred tasks, one running on the 'urls' table, one
on the 'content' table. The mapred tasks run once every few minutes for a few
minutes, with a interval between executions currently at 5 minutes.
Along with jgray's import problems, I wonder if there is some issue with writes
in general, or at least in my case, some interaction between the write side of
things and the read side (caching, etc.) One thing I notice every so often is
that if I stop the write load on the cluster then a few moments later a number
of compactions and sometimes also splits start running as if they were being
deferred.
For a while I was doing funky things with store files but I have since
reinitialized and am running with defaults for everything but blockcache (I use
blocks of 8192).
Any thoughts as to what I can do to help the situation?
- Andy