After deleting the file in question (and noting that fsck now notes the
filesystem as healthy), I restarted HBase.

10 minutes later, only 20 of 370 regions were being served with 100's of
MSG_REPORT_PROCESS_OPEN messages per second being generated in the log
files.

Also, new corruption arrose. This time in this file:

/hbase/categories/compaction.dir/297165731/parent_categories/mapfiles/4043400584087646542/data:
MISSING 1 blocks of total size 0 B.



On Tue, May 20, 2008 at 3:39 PM, Daniel Leffel <[EMAIL PROTECTED]>
wrote:

> After experiencing a region server that would not exit (HBASE-617), I tried
> to bring back up hbase (after first having shut down and bringing back up
> DFS).
>
> There are around 370 regions. The first 250 were assigned to region servers
> within 5 minutes of startup. The rest of the regions took the better part of
> the day to become assigned to a region server. A quick inspection of the
> regionserver logs were showing messages like the following:
>
> 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster: Received
> MSG_REPORT_PROCESS_OPEN : categories,2864153,1211005494348 from
> 10.254.26.31:60020
>
> After waiting for all the regions to be assigned (and an absence of the
> above message appearing in the log), I started a MapReduce job that iterates
> over all regions. Immediately, the above mentioned region began to show up
> in the logs again with the above message and the job failed with an
> IOException because it couldn't locate blocks.
>
> I ran fsck on /hbase and sure enough, blocks are missing from the following
> file (although it reports a size of 0 as what's missing - I presume it just
> doesn't know):
>
> /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000
>
> What's the recovery procedure here? Is there one?
>

Reply via email to