Try with the 0.1.3 release candidate 5:
http://people.apache.org/~stack/hbase-0.1.3-candidate-5/
It has fixes to deal with corrupted log files left over after a
regionserver crash (HBASE-646, 648).
The 'corruption' was likely because when the regionserver went down, it
didn't close its open log files in hdfs so a few log files of zero size
were left over; the edits these Write-Ahead Logs were carrying were
lost. Previous to the release candidate, we didn't deal well when we
came across these empty files.
Until we have appends in hdfs (HADOOP-1700 -- though a subset will be
available in hadoop-0.18 that may be sufficient to our needs), data loss
continues to be a fact of hbase life.
Yours,
St.Ack
Preston Price wrote:
One of the servers that acts as a hadoop and hbase node in our cluster
went down. After the machine was brought back up I restarted hbase but
could not interact with it. After checking the logs on all 3 of our
machines I found a ton of stack traces like the following:
2008-06-26 23:07:56,683 ERROR org.apache.hadoop.hbase.HRegionServer:
error opening region -ROOT-,,0
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1411)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1400)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1395)
at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:254)
at org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
at
org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<init>(HStoreFile.java:554)
at
org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<init>(HStoreFile.java:609)
at
org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
at
org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1258)
at
org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1204)
at java.lang.Thread.run(Thread.java:595)
The machine logging all these errors is not the machine that went down
and I'm not sure what the recovery procedure is for this error.
I appreciate any assistance.
Thanks in advance
Preston Price
[EMAIL PROTECTED]