Hi guys
We've got HBase(0.18.0, r695089) and Hadoop(0.18.0, r686010) running
for a while, and apart from the ocasional regionserver stopping
without notice (and whithout explanations from what we can see in the
logs), problem that we solve easily just by restarting it, we now have
come to face a more serious problem of what I think is data loss.
We use Hbase as a links and documents database (similar to nutch) in
a 3 node cluster (4GB Mem on each node), the links database has a 4
regions and the document database now has 200 regions for a total of
216 (with meta and root).
After the crawl task, which went ok, (we now have 60GB/300GB full in
hdfs) we proceed to do a full table scan to create the indexes and
thats where things started to fail.
We are seing a problem in the logs (at the end of this email). This
repeats untils theres a retriesexausted exception and the task fails
in the map phase. Hadoop fsk tool tells us that hdfs is ok. I'm still
to explore the rest of the logs searching for some kind of error I
will post a new mail if I find anything.
Any help would be greatly appreciated.
Regards
David Alves
2008-11-19 19:47:52,664 DEBUG org.apache.hadoop.dfs.DFSClient:
DataStreamer block blk_-4521866854383825816_55401 wrote packet seqno:0
size:38 offsetInBlock:0 lastPacketInBlock:true 2008-11-19 19:47:52,676
DEBUG org.apache.hadoop.dfs.DFSClient: DFSClient received ack for
seqno 0 2008-11-19 19:47:52,676 DEBUG org.apache.hadoop.dfs.DFSClient:
Closing old block blk_-4521866854383825816_55401 2008-11-19
19:47:52,769 DEBUG org.apache.hadoop.hbase.regionserver.HStore: Added /
hbase/links/1617869663/docDatum/mapfiles/7718188406431341070 with
20622 entries, sequence id 5289673, data size 5.6m, file size 6.0m
2008-11-19 19:47:52,770 DEBUG
org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush
for region links,ext://myrepo/mypath/MYDOC.pdf,1227122254743 in
3015ms, sequence id=5289673, compaction requested=false 2008-11-19
19:53:17,524 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
Error opening scanner (fsOk: true) java.io.IOException: HStoreScanner
failed construction at
org
.apache
.hadoop
.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:70)
at
org
.apache
.hadoop.hbase.regionserver.HStoreScanner.<init>(HStoreScanner.java:68)
at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:
1916) at org.apache.hadoop.hbase.regionserver.HRegion
$HScanner.<init>(HRegion.java:1954) at
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:
1345) at
org
.apache
.hadoop
.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1170)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) at
sun
.reflect
.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:
25) at java.lang.reflect.Method.invoke(Method.java:597) at
org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554) at
org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) Caused by:
java.io.FileNotFoundException: File does not exist: hdfs://cyclops-
prod-1:9000/hbase/document/153945136/docDatum/mapfiles/
5163556575658593611/data at
org
.apache
.hadoop
.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:
394) at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:695)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:
1419) at org.apache.hadoop.io.SequenceFile
$Reader.<init>(SequenceFile.java:1414) at org.apache.hadoop.io.MapFile
$Reader.createDataFileReader(MapFile.java:301) at
org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile
$HbaseReader.createDataFileReader(HStoreFile.java:650) at
org.apache.hadoop.io.MapFile$Reader.open(MapFile.java:283) at
org.apache.hadoop.hbase.regionserver.HStoreFile$HbaseMapFile
$HbaseReader.<init>(HStoreFile.java:632) at
org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile
$Reader.<init>(HStoreFile.java:714) at
org.apache.hadoop.hbase.regionserver.HStoreFile
$HalfMapFileReader.<init>(HStoreFile.java:908) at
org
.apache.hadoop.hbase.regionserver.HStoreFile.getReader(HStoreFile.java:
408) at
org
.apache
.hadoop
.hbase.regionserver.StoreFileScanner.openReaders(StoreFileScanner.java:
96) at
org
.apache
.hadoop
.hbase.regionserver.StoreFileScanner.<init>(StoreFileScanner.java:
67) ... 10 more