RE: Problems with region server OOME

Jonathan Gray Wed, 24 Mar 2010 08:34:46 -0700

As Edward said, try increasing HBase RegionServer heap to 4GB.  Look around the 
wiki for GC tuning information.


What does your data look like and what is your read/write pattern?  Do you have 
large rows or columns?

> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
> Sent: Wednesday, March 24, 2010 6:57 AM
> To: hbase-user@hadoop.apache.org; pe...@bugsoft.nu
> Subject: Re: Problems with region server OOME
> 
> On Wed, Mar 24, 2010 at 6:51 AM, Peter Falk <pe...@bugsoft.nu> wrote:
> 
> > Hi,
> >
> > We have a cluster of four nodes that run hadoop 0.20.1 data nodes and
> hbase
> > 0.20.2 region servers. We occasionally loose region servers with an
> OOME
> > like the following.
> >
> > 2010-03-24 04:22:03,027 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
> > aborting.
> > java.lang.OutOfMemoryError: Java heap space
> >        at java.util.Arrays.copyOf(Arrays.java:2786)
> >        at
> > java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
> >        at java.io.DataOutputStream.write(DataOutputStream.java:90)
> >        at
> org.apache.hadoop.hbase.client.Result.write(Result.java:496)
> >        at
> >
> >
> org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectW
> ritable.java:333)
> >        at
> >
> >
> org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritabl
> e.java:213)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:93
> 7)
> >
> > We would appreciate tips/information of how to change the
> configuration so
> > that OOME probability is minimized. I have attached the current
> hbase-site
> > config below. The nodes configuration is 8 GB RAM, 4 2.66 GHz Intel
> Xeon
> > processors, and two 1 TB discs. The region servers have 2 GB heap,
> and
> > other
> > server have 1 GB heap.
> >
> > Relevant hbase-site.xml config:
> >
> >  <property>
> >    <name>hbase.client.scanner.caching</name>
> >    <value>20</value>
> >  </property>
> >
> >  <property>
> >    <name>dfs.datanode.socket.write.timeout</name>
> >    <value>0</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.regionserver.handler.count</name>
> >    <value>10</value>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hregion.memstore.flush.size</name>
> >    <value>33554432</value>
> >    <description>
> >    Memstore will be flushed to disk if size of the memstore
> >    exceeds this number of bytes.  Value is checked by a thread that
> runs
> >    every hbase.server.thread.wakefrequency.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>hbase.server.thread.wakefrequency</name>
> >    <value>5000</value>
> >    <description>Time to sleep in between searches for work (in
> > milliseconds).
> >    Used as sleep interval by service threads such as META scanner and
> log
> > roller.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>hbase.regionserver.global.memstore.upperLimit</name>
> >    <value>0.35</value>
> >    <description>Maximum size of all memstores in a region server
> before new
> >      updates are blocked and flushes are forced. Defaults to 40% of
> heap
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hstore.blockingStoreFiles</name>
> >    <value>5</value>
> >    <description>
> >    If more than this number of StoreFiles in any one Store
> >    (one StoreFile is written per flush of MemStore) then updates are
> >    blocked for this HRegion until a compaction is completed, or
> >    until hbase.hstore.blockingWaitTime has been exceeded.
> >    </description>
> >  </property>
> >
> >  <property>
> >    <name>hbase.hstore.blockingWaitTime</name>
> >    <value>360000</value>
> >    <description>
> >    The time an HRegion will block updates for after hitting the
> StoreFile
> >    limit defined by hbase.hstore.blockingStoreFiles.
> >    After this time has elapsed, the HRegion will stop blocking
> updates even
> >    if a compaction has not been completed.  Default: 90 seconds.
> >    </description>
> >  </property>
> >
> >  <property>
> >      <name>hfile.block.cache.size</name>
> >      <value>0.15</value>
> >      <description>
> >          Percentage of maximum heap (-Xmx setting) to allocate to
> block
> > cache
> >          used by HFile/StoreFile. Default of 0.2 means allocate 20%.
> >          Set to 0 to disable.
> >      </description>
> >  </property>
> >
> >  <property>
> >    <name>zookeeper.session.timeout</name>
> >    <value>240000</value>
> >    <description>ZooKeeper session timeout. This option is not used by
> HBase
> >      directly, it is for the internals of ZooKeeper. HBase merely
> passes it
> > in
> >      whenever a connection is established to ZooKeeper. It is used by
> > ZooKeeper
> >      for hearbeats. In milliseconds.
> >    </description>
> >  </property>
> >
> >
> > TIA,
> > Peter
> >
> 
> If you have 8Gb of RAM you should give as much of it as you can to
> hbase.

RE: Problems with region server OOME

Reply via email to