After deleting the file in question (and noting that fsck now notes the filesystem as healthy), I restarted HBase.
10 minutes later, only 20 of 370 regions were being served with 100's of MSG_REPORT_PROCESS_OPEN messages per second being generated in the log files. Also, new corruption arrose. This time in this file: /hbase/categories/compaction.dir/297165731/parent_categories/mapfiles/4043400584087646542/data: MISSING 1 blocks of total size 0 B. On Tue, May 20, 2008 at 3:39 PM, Daniel Leffel <[EMAIL PROTECTED]> wrote: > After experiencing a region server that would not exit (HBASE-617), I tried > to bring back up hbase (after first having shut down and bringing back up > DFS). > > There are around 370 regions. The first 250 were assigned to region servers > within 5 minutes of startup. The rest of the regions took the better part of > the day to become assigned to a region server. A quick inspection of the > regionserver logs were showing messages like the following: > > 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster: Received > MSG_REPORT_PROCESS_OPEN : categories,2864153,1211005494348 from > 10.254.26.31:60020 > > After waiting for all the regions to be assigned (and an absence of the > above message appearing in the log), I started a MapReduce job that iterates > over all regions. Immediately, the above mentioned region began to show up > in the logs again with the above message and the job failed with an > IOException because it couldn't locate blocks. > > I ran fsck on /hbase and sure enough, blocks are missing from the following > file (although it reports a size of 0 as what's missing - I presume it just > doesn't know): > > /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000 > > What's the recovery procedure here? Is there one? >