Y, EOFException should return file name. I'll file a Jira with Hadoop and do a patch.
--- Jim Kellerman, Senior Engineer; Powerset > -----Original Message----- > From: stack [mailto:[EMAIL PROTECTED] > Sent: Wednesday, May 21, 2008 10:53 AM > To: hbase-user@hadoop.apache.org > Subject: Re: Problem caused by HBASE-617 - Blocks missing > from hlog.dat > > What does a listing of /hbase/ .META./1028785192/info/info/ > show Daniel? You are still getting EOFException trying to open .META. > region? (Its kinda dumb that HDFS doesn't say which file the > EOF is coming out of). > > St.Ack > > > > > Daniel Leffel wrote: > > Thanks for the response, Jim. > > > > I've tried this and I'm getting the same behavior. Anything else I > > should be looking for? > > > > I also found empty files in the compaction directory which > I deleted - > > hope that was ok. > > > > > > On Wed, May 21, 2008 at 9:29 AM, Jim Kellerman > <[EMAIL PROTECTED]> wrote: > > > > > >> What is happening is that the region server is trying to open the > >> meta, and failing because there is an empty file somewhere in the > >> region or in the recovery log. I would advise the following: > >> > >> 1. shut down HBase (as cleanly as possible) 2. find and delete any > >> zero length files 3. there may be empty, but not zero length, > >> MapFiles. MapFiles are > >> stored in a directory that looks like: > >> <hbase-root>/<tablename>/<a number>/<columnname>/mapfiles/<a long > >> number>/ > >> > >> In this directory there are two files: 'data' and > 'index'. An empty > >> MapFile (one that has been opened and closed with no > data written to > >> it) 'data' will have size 110 (bytes) and 'info' will > have size 137. > >> > >> If you find one of these, you should delete the > directory that contains > >> them (in the example above <a long number>). > >> > >> > >> --- > >> Jim Kellerman, Senior Engineer; Powerset > >> > >> > >> > >>> -----Original Message----- > >>> From: Daniel Leffel [mailto:[EMAIL PROTECTED] > >>> Sent: Wednesday, May 21, 2008 8:13 AM > >>> To: hbase-user@hadoop.apache.org > >>> Subject: Re: Problem caused by HBASE-617 - Blocks missing from > >>> hlog.dat > >>> > >>> After letting the hbase attempt to assign regions all > night long, I > >>> awoke to an unassigned meta region. Here is the master log: > >>> > >>> Desc: {name: .META., families: {info:={name: info, max > versions: 1, > >>> compression: NONE, in memory: false, max length: > 2147483647, bloom > >>> filter: > >>> none}}} > >>> 2008-05-21 11:10:29,560 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Main processing loop: > >>> ProcessRegionClose of .META.,,1, true > >>> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: > >>> region closed: > >>> .META.,,1 > >>> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: > >>> reassign > >>> region: .META.,,1 > >>> 2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster: > >>> assigning region .META.,,1 to server 10.252.242.159:60020 > >>> 2008-05-21 11:10:32,740 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Received > MSG_REPORT_PROCESS_OPEN : > >>> .META.,,1 from 10.252.242.159:60020 > >>> 2008-05-21 11:10:32,740 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE : > >>> .META.,,1 from 10.252.242.159:60020 > >>> 2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster: > >>> 10.252.242.159:60020 no longer serving regionname: .META.,,1, > >>> startKey: <>, > >>> endKey: <>, encodedName: 1028785192, tableDesc: {name: > >>> .META., families: > >>> {info:={name: info, max versions: 1, compression: NONE, in > >>> memory: false, max length: 2147483647, bloom filter: none}}} > >>> 2008-05-21 11:10:32,740 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Main processing loop: > >>> ProcessRegionClose of .META.,,1, true > >>> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: > >>> region closed: > >>> .META.,,1 > >>> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: > >>> reassign > >>> region: .META.,,1 > >>> 2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster: > >>> assigning region .META.,,1 to server 10.254.30.79:60020 > >>> 2008-05-21 11:10:37,595 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Received > MSG_REPORT_PROCESS_OPEN : > >>> .META.,,1 from 10.254.30.79:60020 > >>> 2008-05-21 11:10:37,596 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE : > >>> .META.,,1 from 10.254.30.79:60020 > >>> 2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster: > >>> 10.254.30.79:60020 no longer serving regionname: .META.,,1, > >>> startKey: <>, > >>> endKey: <>, encodedName: 1028785192, tableDesc: {name: > >>> .META., families: > >>> {info:={name: info, max versions: 1, compression: NONE, in > >>> memory: false, max length: 2147483647, bloom filter: none}}} > >>> 2008-05-21 11:10:37,596 DEBUG > >>> org.apache.hadoop.hbase.HMaster: Main processing loop: > >>> ProcessRegionClose of .META.,,1, true > >>> > >>> > >>> and here is the regin server side: > >>> 2008-05-21 11:12:38,027 INFO > org.apache.hadoop.hbase.HRegionServer: > >>> MSG_REGION_OPEN : .META.,,1 > >>> 2008-05-21 11:12:38,027 DEBUG > >>> org.apache.hadoop.hbase.HRegion: Opening region > .META.,,1/1028785192 > >>> 2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore: > >>> loaded > >>> /hbase/.META./1028785192/info/info/2509658022189995817, > >>> isReference=false > >>> 2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore: > >>> loaded > >>> /hbase/.META./1028785192/info/info/8183182393002383771, > >>> isReference=false > >>> 2008-05-21 11:12:38,049 DEBUG > org.apache.hadoop.hbase.HStore: Loaded > >>> 2 > >>> file(s) in hstore 1028785192/info, max sequence id 917793587 > >>> 2008-05-21 11:12:38,116 ERROR > >>> org.apache.hadoop.hbase.HRegionServer: error opening region > >>> .META.,,1 java.io.EOFException > >>> at > java.io.DataInputStream.readFully(DataInputStream.java:180) > >>> at > java.io.DataInputStream.readFully(DataInputStream.java:152) > >>> at > >>> > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434) > >>> at > >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j > >>> ava:1411) > >>> at > >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j > >>> ava:1400) > >>> at > >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j > >>> ava:1395) > >>> at > >>> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263) > >>> at > >>> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242) > >>> at > >>> org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i > >>> > >> nit>(HStoreFile.java:554) > >> > >>> at > >>> org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.< > >>> > >> init>(HStoreFile.java:609) > >> > >>> at > >>> org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382) > >>> at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849) > >>> at > org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431) > >>> at > >>> org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer > >>> .java:1258) > >>> at > >>> org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer > >>> .java:1204) > >>> at java.lang.Thread.run(Thread.java:619) > >>> > >>> > >>> > >>> On Tue, May 20, 2008 at 4:19 PM, stack <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>> Daniel Leffel wrote: > >>>> > >>>> > >>>>> After experiencing a region server that would not exit > >>>>> > >>> (HBASE-617), I > >>> > >>>>> tried to bring back up hbase (after first having shut down and > >>>>> bringing back up DFS). > >>>>> > >>>>> There are around 370 regions. The first 250 were assigned > >>>>> > >>> to region > >>> > >>>>> servers within 5 minutes of startup. The rest of the > >>>>> > >>> regions took the > >>> > >>>>> better part of the day to become assigned to a region > >>>>> > >>> server. A quick > >>> > >>>>> inspection of the regionserver logs were showing > messages like the > >>>>> following: > >>>>> > >>>>> 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster: > >>>>> Received MSG_REPORT_PROCESS_OPEN : > >>>>> > >>> categories,2864153,1211005494348 > >>> > >>>>> from 10.254.26.31:60020 > >>>>> > >>>>> > >>>>> > >>>> These messages are sent over to the master by the > regionserver as a > >>>> kind of ping saying "I'm still alive and working on > >>>> > >>> whatever it was you gave me". > >>> > >>>> Can you tell what was happening by looking in regionserver logs? > >>>> > >>>> Was it that all regions had been given to a single > >>>> > >>> regionserver and it > >>> > >>>> was busy replaying edits before bringing the regions online > >>>> > >>> (There is > >>> > >>>> a single worker thread per regionserver. If lots of edits > >>>> > >>> to replay, > >>> > >>>> can take seconds to minutes to bring on a region). > >>>> > >>>> Did the regions come online gradually or all in a lump? > >>>> > >>>> > >>>>> After waiting for all the regions to be assigned (and an > >>>>> > >>> absence of > >>> > >>>>> the above message appearing in the log), I started a > MapReduce job > >>>>> that iterates over all regions. Immediately, the above > mentioned > >>>>> region began to show up in the logs again with the above > >>>>> > >>> message and > >>> > >>>>> the job failed with an IOException because it couldn't > >>>>> > >>> locate blocks. > >>> > >>>>> I ran fsck on /hbase and sure enough, blocks are > missing from the > >>>>> following file (although it reports a size of 0 as what's > >>>>> > >>> missing - I > >>> > >>>>> presume it just doesn't know): > >>>>> > >>>>> /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000 > >>>>> > >>>>> > >>>>> > >>>> The above looks like the innocuous messages described in > >>>> https://issues.apache.org/jira/browse/HBASE-509. > >>>> > >>>> St.Ack > >>>> > >>>> > >>>> > >>> No virus found in this incoming message. > >>> Checked by AVG. > >>> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release > >>> Date: 5/20/2008 6:45 AM > >>> > >>> > >> No virus found in this outgoing message. > >> Checked by AVG. > >> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release Date: > >> 5/20/2008 > >> 6:45 AM > >> > >> > > > > > > No virus found in this outgoing message. Checked by AVG. Version: 8.0.100 / Virus Database: 269.23.21/1458 - Release Date: 5/21/2008 7:21 AM