Hi Andy,

  You kind suggestions won't work for my case, I have changed the
"io.map.index.skip" to 128 / 512 / 4096,  but I still got OOME, this time,
the OOME happens in various kinds of forms and has caused 4 regions servers
crash.

   I use 6.5GRam (total 8Gram) in the confirugation.

   A simple roughly calculation on Memory usage of 4450 HStoreFiles(53G on
DFS):

Input:
   RowkeyLength=50Bytes
   RowLength=400Bytes
   HStoreFileSize=53/3 ~ 20GBytes
   Each Row has 22 columns
   OneHStoreIndexSize ~ 100 Bytes

Calculation:
   TotalCells= 22 * 20G/400B=50,000,000 = 11 * 10^8
   TotalIndexSize = 100 * 11 * 10^8  = 110 G

   if we set skip 1024, we'd use 110MB ram or more. Now the OOME occurs
before the compaction thread meet with MemoryShortage, while it happens on
other timely threads, like serverReport. It seems that I'd have to rebuild
the data.

   Thanks!

  - Ling
=================

  I attach some OOME "output/hbase-xxx.out":
server:13-2
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid1391.hprof ...
Heap dump file created [4373449572 bytes in 240.980 secs]
Exception in thread "regionserver/0:0:0:0:0:0:0:0:62020"        at
org.apache.hadoop.ipc.Client.call(Client.java:686)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy1.renewLease(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.renewLease(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:958)
        at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:970)
        at java.lang.Thread.run(Thread.java:619)
java.lang.OutOfMemoryError: Java heap space
server:13-4
2009-05-13 19:06:41,329 INFO org.apache.hadoop.hbase.regionserver.HRegion:
region CDR,13914000048#2009-03-23 12:14:24,1241664073405/2009049445
available
2009-05-13 19:06:41,330 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting  compaction on region CDR,13914000048#2009-03-23
12:14:24,1241664073405
2009-05-13 19:46:41,790 FATAL
org.apache.hadoop.hbase.regionserver.HRegionServer: OutOfMemoryError,
aborting.
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2786)
        at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
        at java.io.DataOutputStream.write(DataOutputStream.java:90)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
org.apache.hadoop.hbase.HServerLoad$RegionLoad.write(HServerLoad.java:176)
        at org.apache.hadoop.hbase.HServerLoad.write(HServerLoad.java:408)
        at org.apache.hadoop.hbase.HServerInfo.write(HServerInfo.java:150)
        at
org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:303)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.write(HBaseRPC.java:156)
        at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:472)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:691)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
        at $Proxy0.regionServerReport(Unknown Source)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:988)
        at java.lang.Thread.run(Thread.java:619)
2009-05-13 19:46:41,843 ERROR
org.apache.hadoop.hbase.regionserver.HRegionServer: Error opening
CDR,13912998427#2009-03-15 10:08:14,1241664073405
java.lang.NullPointerException
        at
org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:460)
        at org.apache.hadoop.ipc.Client.call(Client.java:687)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy1.getFileInfo(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:578)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:409)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
        at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1431)
        at
org.apache.hadoop.hbase.io.SequenceFile$Reader.<init>(SequenceFile.java:1426)
        at
org.apache.hadoop.hbase.io.MapFile$Reader.createDataFileReader(MapFile.java:327)
        at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.createDataFileReader(HBaseMapFile.java:95)
        at org.apache.hadoop.hbase.io.MapFile$Reader.open(MapFile.java:309)
        at
org.apache.hadoop.hbase.io.HBaseMapFile$HBaseReader.<init>(HBaseMapFile.java:78)
        at
org.apache.hadoop.hbase.io.BloomFilterMapFile$Reader.<init>(BloomFilterMapFile.java:68)
        at
org.apache.hadoop.hbase.io.HalfMapFileReader.<init>(HalfMapFileReader.java:91)
        at
org.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:708)
        at
org.apache.hadoop.hbase.regionserver.HStore.setupReaders(HStore.java:259)
        at
org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:240)
        at
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1791)
        at
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:278)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:2039)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2010)
        at
org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1925)
        at java.lang.Thread.run(Thread.java:619)
2009-05-13 19:46:41,848 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
request=0.0, regions=314, stores=314, storefiles=4448,
storefileIndexSize=251, memcacheSize=0, usedHeap=4334, maxHeap=5777
2009-05-13 19:46:41,855 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
server on 62020
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid19735.hprof ...
Heap dump file created [4370414771 bytes in 239.451 secs]
Exception in thread "LeaseChecker" java.lang.OutOfMemoryError: Java heap
space
        at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:59)
        at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:42)
        at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:314)
        at
org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:177)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:801)
        at org.apache.hadoop.ipc.Client.call(Client.java:686)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy1.renewLease(Unknown Source)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at $Proxy1.renewLease(Unknown Source)
        at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.renew(DFSClient.java:958)
        at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.run(DFSClient.java:970)
        at java.lang.Thread.run(Thread.java:619)




On Wed, May 13, 2009 at 12:36 PM, Ling Qian <[email protected]> wrote:

> Thanks Andy!
>
> I have apllied the tricks, it's compacting, I will post the result when
> it's done.
>
>   On Wed, May 13, 2009 at 2:51 AM, Andrew Purtell <[email protected]>wrote:
>
>>
>> Hi Ling,
>>
>> We have identified this as an issue and are working on a
>> solution for 0.20.
>> See https://issues.apache.org/jira/browse/HBASE-1410
>>
>> In the meantime, there may be a way to recover your current
>> situation, for 0.19. The trick may be the following dual
>> strategy:
>>  1) Increase the available heap for the region server
>>     temporarily to the maximum available for the amount of
>>     RAM in the server;
>>  2) Temporarily skip a substantial number of index entries
>>     to lessen the heap load required to hold the storefile
>>     indexes in memory, by adding something like the
>>     following in conf/hbase-site.xml:
>>
>>       <property>
>>         <name>io.map.index.skip</name>
>>         <value>32</value>
>>       </property>
>>
>>     There is a trade off here between reducing the number
>>     of keys read into the in memory index and yet leaving
>>     enough keys in place so compaction/split can find a
>>     midkey. If 16 does not work, you can consider trying
>>     64, 128, 256, 512, 1024.
>>
>> Please consider trying this approach. If it does not work
>> to clear the issue, then come back and we can think about a
>> next step.
>>
>> > From: Ling Qian
>> > Subject: Re: OOME when restarting hbase
>> > To: [email protected], [email protected]
>> > Date: Tuesday, May 12, 2009, 2:12 AM
>> >
>> > Is there any method to save the 4450 hstorefiles? ( about
>> > 53GB on DFS)
>>
>>  - Andy
>>
>>
>>
>>
>>
>>
>


-- 

- Ling

Reply via email to