[ 
https://issues.apache.org/jira/browse/HBASE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652890#action_12652890
 ] 

Andrew Purtell commented on HBASE-1040:
---------------------------------------

OOME last night did not take down the region server and it did not relinquish 
its regions:

2008-12-03 10:03:51,625 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
21 on 60020, call next(325852455557500270, 30) from 10.30.94.53:51099: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(ImmutableBytesWritable.java:110)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1754)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
        at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:03:53,173 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
16 on 60020, call next(3850133095248684283, 30) from 10.30.94.53:51111: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(ImmutableBytesWritable.java:110)
        at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1754)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
        at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:03:55,024 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
13 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at sun.nio.ch.Util.releaseTemporaryDirectBuffer(Util.java:67)
        at sun.nio.ch.IOUtil.read(IOUtil.java:212)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
        at 
org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at java.io.DataInputStream.read(DataInputStream.java:132)
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
        at 
org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:1006)
        at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:238)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:190)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
        at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:859)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1394)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1430)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
        at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
        at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
        at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source)
2008-12-03 10:03:57,082 INFO org.apache.hadoop.fs.FSInputChecker: Found 
checksum error: org.apache.hadoop.fs.ChecksumException: Checksum error: 
/blk_1503942726006789756:of:/data/hbase/content/38150535/content/mapfiles/1992009933541116621/data
 at 3610624
        at 
org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
        at 
org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:242)
        at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:177)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:194)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
        at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:859)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1394)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1430)
        at 
org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1379)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at 
org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:1898)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1928)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
        at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

2008-12-03 10:03:57,083 WARN org.apache.hadoop.dfs.DFSClient: Found Checksum 
error for blk_1503942726006789756_2211303 from 10.30.94.32:50010 at 3610624
2008-12-03 10:03:59,711 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 
on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:02,486 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
19 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:06,829 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 
on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:12,044 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
18 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:13,607 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 2803606790257188079 
lease expired
2008-12-03 10:04:16,873 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
29 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:25,970 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
14 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:34,702 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
21 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:59972: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:04:53,653 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
25 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:60076: error: 
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
java.io.IOException: java.lang.OutOfMemoryError: Java heap space
2008-12-03 10:05:05,487 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 
on 60020, call next(3371752675632192545, 30) from 10.30.94.34:37462: error: 
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:05:25,685 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
14 on 60020, call next(-8816589261293751564, 30) from 10.30.94.35:60092: error: 
java.io.IOException: read 218 bytes, should read 1666930
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1842)
        at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
        at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
        at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:183)
        at 
org.apache.hadoop.hbase.regionserver.HStoreScanner.next(HStoreScanner.java:226)
        at 
org.apache.hadoop.hbase.regionserver.HRegion$HScanner.next(HRegion.java:1920)
        at 
org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1356)
        at sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:634)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
2008-12-03 10:05:48,313 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-6882942580400942785 lease expired
2008-12-03 10:06:08,014 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 325852455557500270 
lease expired
2008-12-03 10:09:37,822 INFO org.apache.hadoop.hbase.regionserver.HLog: Closed 
hdfs://sjdc-atr-dc-1.atr.trendmicro.com:50000/data/hbase/log_10.30.94.32_1228300601380_60020/hlog.dat.1228313220043,
 entries=100001. New log writer: 
/data/hbase/log_10.30.94.32_1228300601380_60020/hlog.dat.1228316977821
2008-12-03 10:20:11,460 INFO 
org.apache.hadoop.hbase.regionserver.MemcacheFlusher: Forced flushing of 
content,d6551908ed66a9122f8ec39594c8d36e,1228218699826 because global memcache 
limit of 536870912 exceeded; currenly 536909348 and flushing till 268435456
2008-12-03 10:20:16,973 WARN org.apache.hadoop.dfs.DFSClient: DataStreamer 
Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded

2008-12-03 10:20:16,974 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery 
for block blk_-4000942059672147735_2225305 bad datanode[0] 10.30.94.32:50010
2008-12-03 10:20:16,974 WARN org.apache.hadoop.dfs.DFSClient: Error Recovery 
for block blk_-4000942059672147735_2225305 in pipeline 10.30.94.32:50010, 
10.30.94.50:50010, 10.30.94.54:50010: bad datanode 10.30.94.32:50010
2008-12-03 10:20:39,106 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
12 on 60020, call batchUpdates([EMAIL PROTECTED], 
[Lorg.apache.hadoop.hbase.io.BatchUpdate;@7f81f38f) from 10.30.94.4:52172: 
error: java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit 
exceeded
java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:20:48,269 WARN org.apache.hadoop.ipc.Server: Out of Memory in 
server select
java.lang.OutOfMemoryError: GC overhead limit exceeded
2008-12-03 10:22:39,111 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 
-3591169216918124621 lease expired
2008-12-03 10:27:06,876 INFO 
org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner 7353895356043684043 
lease expired


> OOME does not cause graceful shutdown under some failure scenarios
> ------------------------------------------------------------------
>
>                 Key: HBASE-1040
>                 URL: https://issues.apache.org/jira/browse/HBASE-1040
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.18.1
>            Reporter: Andrew Purtell
>
> Probably OOME related updates to trunk should be backported to 0.18 branch. I 
> am seeing these exceptions on our cluster in output from tablemap/tablereduce 
> jobs:
> > java.io.IOException: java.lang.OutOfMemoryError: Java heap space
> > at java.io.DataInputStream.readFull(DataInputSteram.java:175)
> > at 
> > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:64)
> > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:102)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1933)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1833)
> > at org.apahce.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879)
> > at org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
> > at 
> > org.apache.hadoop.hbase.regionserver.StoreFileScanner.getNext(StoreFileScanner.java:312)
> When such OOMEs as above happen, the cluster does not recover without manual 
> intervention. The regionservers sometimes go down after this, or sometimes do 
> not and stay up in sick condition for a while. Regions go offline and remain 
> unavailable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to