After experiencing a region server that would not exit (HBASE-617), I tried to bring back up hbase (after first having shut down and bringing back up DFS).
There are around 370 regions. The first 250 were assigned to region servers within 5 minutes of startup. The rest of the regions took the better part of the day to become assigned to a region server. A quick inspection of the regionserver logs were showing messages like the following: 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN : categories,2864153,1211005494348 from 10.254.26.31:60020 After waiting for all the regions to be assigned (and an absence of the above message appearing in the log), I started a MapReduce job that iterates over all regions. Immediately, the above mentioned region began to show up in the logs again with the above message and the job failed with an IOException because it couldn't locate blocks. I ran fsck on /hbase and sure enough, blocks are missing from the following file (although it reports a size of 0 as what's missing - I presume it just doesn't know): /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000 What's the recovery procedure here? Is there one?
