As Edward said, try increasing HBase RegionServer heap to 4GB. Look around the wiki for GC tuning information.
What does your data look like and what is your read/write pattern? Do you have large rows or columns? > -----Original Message----- > From: Edward Capriolo [mailto:edlinuxg...@gmail.com] > Sent: Wednesday, March 24, 2010 6:57 AM > To: hbase-user@hadoop.apache.org; pe...@bugsoft.nu > Subject: Re: Problems with region server OOME > > On Wed, Mar 24, 2010 at 6:51 AM, Peter Falk <pe...@bugsoft.nu> wrote: > > > Hi, > > > > We have a cluster of four nodes that run hadoop 0.20.1 data nodes and > hbase > > 0.20.2 region servers. We occasionally loose region servers with an > OOME > > like the following. > > > > 2010-03-24 04:22:03,027 FATAL > > org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError, > > aborting. > > java.lang.OutOfMemoryError: Java heap space > > at java.util.Arrays.copyOf(Arrays.java:2786) > > at > > java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94) > > at java.io.DataOutputStream.write(DataOutputStream.java:90) > > at > org.apache.hadoop.hbase.client.Result.write(Result.java:496) > > at > > > > > org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectW > ritable.java:333) > > at > > > > > org.apache.hadoop.hbase.io.HbaseObjectWritable.write(HbaseObjectWritabl > e.java:213) > > at > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:93 > 7) > > > > We would appreciate tips/information of how to change the > configuration so > > that OOME probability is minimized. I have attached the current > hbase-site > > config below. The nodes configuration is 8 GB RAM, 4 2.66 GHz Intel > Xeon > > processors, and two 1 TB discs. The region servers have 2 GB heap, > and > > other > > server have 1 GB heap. > > > > Relevant hbase-site.xml config: > > > > <property> > > <name>hbase.client.scanner.caching</name> > > <value>20</value> > > </property> > > > > <property> > > <name>dfs.datanode.socket.write.timeout</name> > > <value>0</value> > > </property> > > > > <property> > > <name>hbase.regionserver.handler.count</name> > > <value>10</value> > > </property> > > > > <property> > > <name>hbase.hregion.memstore.flush.size</name> > > <value>33554432</value> > > <description> > > Memstore will be flushed to disk if size of the memstore > > exceeds this number of bytes. Value is checked by a thread that > runs > > every hbase.server.thread.wakefrequency. > > </description> > > </property> > > > > <property> > > <name>hbase.server.thread.wakefrequency</name> > > <value>5000</value> > > <description>Time to sleep in between searches for work (in > > milliseconds). > > Used as sleep interval by service threads such as META scanner and > log > > roller. > > </description> > > </property> > > > > <property> > > <name>hbase.regionserver.global.memstore.upperLimit</name> > > <value>0.35</value> > > <description>Maximum size of all memstores in a region server > before new > > updates are blocked and flushes are forced. Defaults to 40% of > heap > > </description> > > </property> > > > > <property> > > <name>hbase.hstore.blockingStoreFiles</name> > > <value>5</value> > > <description> > > If more than this number of StoreFiles in any one Store > > (one StoreFile is written per flush of MemStore) then updates are > > blocked for this HRegion until a compaction is completed, or > > until hbase.hstore.blockingWaitTime has been exceeded. > > </description> > > </property> > > > > <property> > > <name>hbase.hstore.blockingWaitTime</name> > > <value>360000</value> > > <description> > > The time an HRegion will block updates for after hitting the > StoreFile > > limit defined by hbase.hstore.blockingStoreFiles. > > After this time has elapsed, the HRegion will stop blocking > updates even > > if a compaction has not been completed. Default: 90 seconds. > > </description> > > </property> > > > > <property> > > <name>hfile.block.cache.size</name> > > <value>0.15</value> > > <description> > > Percentage of maximum heap (-Xmx setting) to allocate to > block > > cache > > used by HFile/StoreFile. Default of 0.2 means allocate 20%. > > Set to 0 to disable. > > </description> > > </property> > > > > <property> > > <name>zookeeper.session.timeout</name> > > <value>240000</value> > > <description>ZooKeeper session timeout. This option is not used by > HBase > > directly, it is for the internals of ZooKeeper. HBase merely > passes it > > in > > whenever a connection is established to ZooKeeper. It is used by > > ZooKeeper > > for hearbeats. In milliseconds. > > </description> > > </property> > > > > > > TIA, > > Peter > > > > If you have 8Gb of RAM you should give as much of it as you can to > hbase.