stack wrote:
Thanks for writing this note. I'd seen the IOE and presumed it a known
issue in hadoop 0.17. I then went looking for it but couldn't find
one. Turns out its a silly issue in hbase (HBASE-761).
How many regions you have loaded up in your regionserver?
You are using default heap size? You might try upping it a little.
Also make sure you have upped open file descriptors limit too (See #4
and #6 in the FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ).
thanks St.Ack,
we've reran our tests, uncommented HEAP_SIZE=1000 in our hbase-env, and have
updated trunk to the latest. we haven't run into OOME so far, and no sight of
all those compaction errors anymore (thanks!).
we have 5 regions loaded up in our regionserver
our test hammers hbase as we import a bunch of files into our hbase tables. now,
what we see is that hbase seems to be blocking such that nothing gets imported.
everything is up and running (hdfs namenode/datanodes, hbase master/regionserver)
below, are the logs we see from our hbase region server and from the hdfs
datanode
----------------------------------
hbase regionserver
---------------------------------
2008-07-22 16:37:10,419 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 396381, skipped 1, 315153
2008-07-22 16:37:10,672 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 258344, skipped 1, 204628
2008-07-22 16:37:10,856 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 342049, skipped 1, 271105
2008-07-22 16:37:11,080 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 237573, skipped 1, 188148
2008-07-22 16:37:11,280 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 78030028, skipped 1, 39326312
2008-07-22 16:37:33,464 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 366849, skipped 1, 291596
2008-07-22 16:37:33,928 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 984138, skipped 2, 897309
2008-07-22 16:37:34,201 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 67294, skipped 1, 51834
2008-07-22 16:37:34,643 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 45569, skipped 2, 41020
2008-07-22 16:37:35,061 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region dmls,,1216768730386 in 25sec
2008-07-22 16:38:00,151 INFO org.apache.hadoop.hbase.regionserver.HRegion:
starting compaction on region dmls,,1216768730386
2008-07-22 16:38:00,157 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 413508, skipped 2, 396083
2008-07-22 16:38:00,502 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 269288, skipped 2, 258024
2008-07-22 16:38:00,789 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 356584, skipped 2, 341743
2008-07-22 16:38:01,032 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 247637, skipped 2, 237273
2008-07-22 16:38:01,337 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 83129653, skipped 0, 0
2008-07-22 16:38:42,495 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 382791, skipped 2, 366533
2008-07-22 16:38:42,724 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 1024454, skipped 2, 897309
2008-07-22 16:38:42,959 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 70464, skipped 2, 66950
2008-07-22 16:38:43,377 INFO org.apache.hadoop.hbase.regionserver.HStore:
Compaction size 47891, skipped 2, 41020
2008-07-22 16:38:43,716 INFO org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region dmls,,1216768730386 in 43sec
2008-07-22 16:38:53,954 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 4 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:40:03,931 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 9 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:41:13,943 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 1 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:42:23,955 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 0 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:43:33,968 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 2 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:44:34,095 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 5 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:45:44,109 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 3 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:46:54,122 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 8 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:48:04,135 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 6 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
2008-07-22 16:48:18,116 INFO org.apache.hadoop.hbase.regionserver.LogRoller:
Rolling hlog. Number of entries: 26294
2008-07-22 16:48:18,133 INFO org.apache.hadoop.hbase.regionserver.HLog: New log
writer created at /hbase/log_64.62.244.2_1216768695534_60020/hlog.dat.1216770498127
2008-07-22 16:49:14,165 INFO org.apache.hadoop.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 7 on 60020' on region
dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size
---------------------------------------
hdfs datanode
--------------------------------------
2008-07-22 17:10:41,828 WARN org.apache.hadoop.dfs.DataNode:
64.62.244.2:50010:Got exception while serving blk_-355911506373371046 to
/64.62.244.2:
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
at java.lang.Thread.run(Thread.java:619)
2008-07-22 17:10:41,828 ERROR org.apache.hadoop.dfs.DataNode:
64.62.244.2:50010:DataXceiver: java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039)
at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
at java.lang.Thread.run(Thread.java:619)
2008-07-22 17:10:41,834 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010
Served block blk_8154627018748859939 to /64.62.244.2
2008-07-22 17:10:41,843 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010
Served block blk_6062812905615890844 to /64.62.244.2
2008-07-22 17:10:41,987 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010
Served block blk_-3883168343557890264 to /64.62.244.2
St.Ack
leith wrote:
looking at our region logs, we've noticed that the compaction thread
constantly runs into exceptions. the entire log is filled with
something like this:
----------------------------------
2008-07-22 12:29:52,759 WARN
org.apache.hadoop.hbase.regionserver.HStore: Exception closing reader
for 242866774/new
java.io.IOException: Stream closed
at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.close(DFSClient.java:1319)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at
org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1581)
at org.apache.hadoop.io.MapFile$Reader.close(MapFile.java:577)
at
org.apache.hadoop.hbase.regionserver.HStore.closeCompactionReaders(HStore.java:917)
at
org.apache.hadoop.hbase.regionserver.HStore.compactHStoreFiles(HStore.java:910)
at
org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:787)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:887)
at
org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:847)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:84)
-------------------------------------
the regionserver is taking about up a good amount of memory on our
system, and nothing is happening except for i assume compaction/split
processes.
these only seem to be warnings, but there is so many of them, it would
be nice to get a second opinion on this.
we've also gotten an 'out of memory' exception a few times from the
compaction thread, and those actually ended up killing the thread,
resulting in the region server shutting itself down.
thanks,
/leith