After a flurry of compactions and splits after manual cluster restart, everything is back up. Due to the nature of the data in the table I can't tell if there was data loss.
At the time of the below reported problem there was a swarm of OOMEs. Three regionservers went down within minutes of each other. The load at the time was four reducers that were writing serialized Document objects back. I suspect the writes were all hitting the same few regions (<= 4). These writes were in addition to the crawler write load of ~100-200 objects/second. - Andy > From: Andrew Purtell <[EMAIL PROTECTED]> > Subject: Re: OOME hell > To: [email protected] > Date: Tuesday, December 2, 2008, 7:11 PM > OOME during compaction: > > 2008-12-02 21:57:05,930 FATAL > org.apache.hadoop.hbase.regionserver.HRegionServer > : Set stop flag in > regionserver/0:0:0:0:0:0:0:0:60020.compactor > java.lang.OutOfMemoryError: Java heap space > at > org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(Immutabl > eBytesWritable.java:110) > at > org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile > .java:1754) > at > org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882) > at > org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516) > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1003) > at > org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:893) > > > results in dead region because, upon reassignment, any > scanner open gets this: > > Caused by: java.io.FileNotFoundException: File does not > exist: > hdfs://sjdc-atr-dc-1.atr.trendmicro.com:50000/data/hbase/content/1828513599/info/mapfiles/3497759206614039025/data > at > org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394) > at > org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:695) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415) > at > org.apache.hadoop.io.MapFile$Reader.createDataFileReader(MapFile.java:301) > at > org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.createDataFileReader(HStoreFile.java:650) > at > org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:283) > at > org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632) > > Manual intervention required here. Will try to restart and > see what happens.
