After a flurry of compactions and splits after manual cluster
restart, everything is back up. Due to the nature of the data in
the table I can't tell if there was data loss. 

At the time of the below reported problem there was a swarm of
OOMEs. Three regionservers went down within minutes of each other.

The load at the time was four reducers that were writing
serialized Document objects back. I suspect the writes were all
hitting the same few regions (<= 4). These writes were in 
addition to the crawler write load of ~100-200 objects/second.

   - Andy

> From: Andrew Purtell <[EMAIL PROTECTED]>
> Subject: Re: OOME hell
> To: [email protected]
> Date: Tuesday, December 2, 2008, 7:11 PM
> OOME during compaction:
> 
> 2008-12-02 21:57:05,930 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer
> : Set stop flag in
> regionserver/0:0:0:0:0:0:0:0:60020.compactor
> java.lang.OutOfMemoryError: Java heap space
>         at
> org.apache.hadoop.hbase.io.ImmutableBytesWritable.readFields(Immutabl
> eBytesWritable.java:110)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile
> .java:1754)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1882)
>         at
> org.apache.hadoop.io.MapFile$Reader.next(MapFile.java:516)
>         at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1003)
>         at
> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:893)
> 
> 
> results in dead region because, upon reassignment, any
> scanner open gets this: 
> 
> Caused by: java.io.FileNotFoundException: File does not
> exist:
> hdfs://sjdc-atr-dc-1.atr.trendmicro.com:50000/data/hbase/content/1828513599/info/mapfiles/3497759206614039025/data
>         at
> org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:394)
>         at
> org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:695)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1420)
>         at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1415)
>         at
> org.apache.hadoop.io.MapFile$Reader.createDataFileReader(MapFile.java:301)
>         at
> org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.createDataFileReader(HStoreFile.java:650)
>         at
> org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:283)
>         at
> org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:632)
> 
> Manual intervention required here. Will try to restart and
> see what happens.


      

Reply via email to