Y, EOFException should return file name. I'll file a Jira with Hadoop and do a 
patch.

---
Jim Kellerman, Senior Engineer; Powerset


> -----Original Message-----
> From: stack [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, May 21, 2008 10:53 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Problem caused by HBASE-617 - Blocks missing
> from hlog.dat
>
> What does a listing of /hbase/ .META./1028785192/info/info/
> show Daniel?  You are still getting EOFException trying to open .META.
> region?  (Its kinda dumb that HDFS doesn't say which file the
> EOF is coming out of).
>
> St.Ack
>
>
>
>
> Daniel Leffel wrote:
> > Thanks for the response, Jim.
> >
> > I've tried this and I'm getting the same behavior. Anything else I
> > should be looking for?
> >
> > I also found empty files in the compaction directory which
> I deleted -
> > hope that was ok.
> >
> >
> > On Wed, May 21, 2008 at 9:29 AM, Jim Kellerman
> <[EMAIL PROTECTED]> wrote:
> >
> >
> >> What is happening is that the region server is trying to open the
> >> meta, and failing because there is an empty file somewhere in the
> >> region or in the recovery log. I would advise the following:
> >>
> >> 1. shut down HBase (as cleanly as possible) 2. find and delete any
> >> zero length files 3. there may be empty, but not zero length,
> >> MapFiles. MapFiles are
> >>   stored in a directory that looks like:
> >>   <hbase-root>/<tablename>/<a number>/<columnname>/mapfiles/<a long
> >> number>/
> >>
> >>   In this directory there are two files: 'data' and
> 'index'. An empty
> >>   MapFile (one that has been opened and closed with no
> data written to
> >>   it) 'data' will have size 110 (bytes) and 'info' will
> have size 137.
> >>
> >>   If you find one of these, you should delete the
> directory that contains
> >>   them (in the example above <a long number>).
> >>
> >>
> >> ---
> >> Jim Kellerman, Senior Engineer; Powerset
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Daniel Leffel [mailto:[EMAIL PROTECTED]
> >>> Sent: Wednesday, May 21, 2008 8:13 AM
> >>> To: hbase-user@hadoop.apache.org
> >>> Subject: Re: Problem caused by HBASE-617 - Blocks missing from
> >>> hlog.dat
> >>>
> >>> After letting the hbase attempt to assign regions all
> night long, I
> >>> awoke to an unassigned meta region. Here is the master log:
> >>>
> >>> Desc: {name: .META., families: {info:={name: info, max
> versions: 1,
> >>> compression: NONE, in memory: false, max length:
> 2147483647, bloom
> >>> filter:
> >>> none}}}
> >>> 2008-05-21 11:10:29,560 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Main processing loop:
> >>> ProcessRegionClose of .META.,,1, true
> >>> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
> >>> region closed:
> >>> .META.,,1
> >>> 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
> >>> reassign
> >>> region: .META.,,1
> >>> 2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster:
> >>> assigning region .META.,,1 to server 10.252.242.159:60020
> >>> 2008-05-21 11:10:32,740 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Received
> MSG_REPORT_PROCESS_OPEN :
> >>> .META.,,1 from 10.252.242.159:60020
> >>> 2008-05-21 11:10:32,740 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> >>> .META.,,1 from 10.252.242.159:60020
> >>> 2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster:
> >>> 10.252.242.159:60020 no longer serving regionname: .META.,,1,
> >>> startKey: <>,
> >>> endKey: <>, encodedName: 1028785192, tableDesc: {name:
> >>> .META., families:
> >>> {info:={name: info, max versions: 1, compression: NONE, in
> >>> memory: false, max length: 2147483647, bloom filter: none}}}
> >>> 2008-05-21 11:10:32,740 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Main processing loop:
> >>> ProcessRegionClose of .META.,,1, true
> >>> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
> >>> region closed:
> >>> .META.,,1
> >>> 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
> >>> reassign
> >>> region: .META.,,1
> >>> 2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster:
> >>> assigning region .META.,,1 to server 10.254.30.79:60020
> >>> 2008-05-21 11:10:37,595 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Received
> MSG_REPORT_PROCESS_OPEN :
> >>> .META.,,1 from 10.254.30.79:60020
> >>> 2008-05-21 11:10:37,596 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> >>> .META.,,1 from 10.254.30.79:60020
> >>> 2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster:
> >>> 10.254.30.79:60020 no longer serving regionname: .META.,,1,
> >>> startKey: <>,
> >>> endKey: <>, encodedName: 1028785192, tableDesc: {name:
> >>> .META., families:
> >>> {info:={name: info, max versions: 1, compression: NONE, in
> >>> memory: false, max length: 2147483647, bloom filter: none}}}
> >>> 2008-05-21 11:10:37,596 DEBUG
> >>> org.apache.hadoop.hbase.HMaster: Main processing loop:
> >>> ProcessRegionClose of .META.,,1, true
> >>>
> >>>
> >>> and here is the regin server side:
> >>> 2008-05-21 11:12:38,027 INFO
> org.apache.hadoop.hbase.HRegionServer:
> >>> MSG_REGION_OPEN : .META.,,1
> >>> 2008-05-21 11:12:38,027 DEBUG
> >>> org.apache.hadoop.hbase.HRegion: Opening region
> .META.,,1/1028785192
> >>> 2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore:
> >>> loaded
> >>> /hbase/.META./1028785192/info/info/2509658022189995817,
> >>> isReference=false
> >>> 2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore:
> >>> loaded
> >>> /hbase/.META./1028785192/info/info/8183182393002383771,
> >>> isReference=false
> >>> 2008-05-21 11:12:38,049 DEBUG
> org.apache.hadoop.hbase.HStore: Loaded
> >>> 2
> >>> file(s) in hstore 1028785192/info, max sequence id 917793587
> >>> 2008-05-21 11:12:38,116 ERROR
> >>> org.apache.hadoop.hbase.HRegionServer: error opening region
> >>> .META.,,1 java.io.EOFException
> >>>         at
> java.io.DataInputStream.readFully(DataInputStream.java:180)
> >>>         at
> java.io.DataInputStream.readFully(DataInputStream.java:152)
> >>>         at
> >>>
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
> >>>         at
> >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> >>> ava:1411)
> >>>         at
> >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> >>> ava:1400)
> >>>         at
> >>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> >>> ava:1395)
> >>>         at
> >>> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
> >>>         at
> >>> org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
> >>>         at
> >>> org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i
> >>>
> >> nit>(HStoreFile.java:554)
> >>
> >>>         at
> >>> org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<
> >>>
> >> init>(HStoreFile.java:609)
> >>
> >>>         at
> >>> org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
> >>>         at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
> >>>         at
> org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
> >>>         at
> >>> org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer
> >>> .java:1258)
> >>>         at
> >>> org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer
> >>> .java:1204)
> >>>         at java.lang.Thread.run(Thread.java:619)
> >>>
> >>>
> >>>
> >>> On Tue, May 20, 2008 at 4:19 PM, stack <[EMAIL PROTECTED]> wrote:
> >>>
> >>>
> >>>> Daniel Leffel wrote:
> >>>>
> >>>>
> >>>>> After experiencing a region server that would not exit
> >>>>>
> >>> (HBASE-617), I
> >>>
> >>>>> tried to bring back up hbase (after first having shut down and
> >>>>> bringing back up DFS).
> >>>>>
> >>>>> There are around 370 regions. The first 250 were assigned
> >>>>>
> >>> to region
> >>>
> >>>>> servers within 5 minutes of startup. The rest of the
> >>>>>
> >>> regions took the
> >>>
> >>>>> better part of the day to become assigned to a region
> >>>>>
> >>> server. A quick
> >>>
> >>>>> inspection of the regionserver logs were showing
> messages like the
> >>>>> following:
> >>>>>
> >>>>> 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster:
> >>>>> Received MSG_REPORT_PROCESS_OPEN :
> >>>>>
> >>> categories,2864153,1211005494348
> >>>
> >>>>> from 10.254.26.31:60020
> >>>>>
> >>>>>
> >>>>>
> >>>> These messages are sent over to the master by the
> regionserver as a
> >>>> kind of ping saying "I'm still alive and working on
> >>>>
> >>> whatever it was you gave me".
> >>>
> >>>> Can you tell what was happening by looking in regionserver logs?
> >>>>
> >>>> Was it that all regions had been given to a single
> >>>>
> >>> regionserver and it
> >>>
> >>>> was busy replaying edits before bringing the regions online
> >>>>
> >>> (There is
> >>>
> >>>> a single worker thread per regionserver.  If lots of edits
> >>>>
> >>> to replay,
> >>>
> >>>> can take seconds to minutes to bring on a region).
> >>>>
> >>>> Did the regions come online gradually or all in a lump?
> >>>>
> >>>>
> >>>>> After waiting for all the regions to be assigned (and an
> >>>>>
> >>> absence of
> >>>
> >>>>> the above message appearing in the log), I started a
> MapReduce job
> >>>>> that iterates over all regions. Immediately, the above
> mentioned
> >>>>> region began to show up in the logs again with the above
> >>>>>
> >>> message and
> >>>
> >>>>> the job failed with an IOException because it couldn't
> >>>>>
> >>> locate blocks.
> >>>
> >>>>> I ran fsck on /hbase and sure enough, blocks are
> missing from the
> >>>>> following file (although it reports a size of 0 as what's
> >>>>>
> >>> missing - I
> >>>
> >>>>> presume it just doesn't know):
> >>>>>
> >>>>> /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000
> >>>>>
> >>>>>
> >>>>>
> >>>> The above looks like the innocuous messages described in
> >>>> https://issues.apache.org/jira/browse/HBASE-509.
> >>>>
> >>>> St.Ack
> >>>>
> >>>>
> >>>>
> >>> No virus found in this incoming message.
> >>> Checked by AVG.
> >>> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release
> >>> Date: 5/20/2008 6:45 AM
> >>>
> >>>
> >> No virus found in this outgoing message.
> >> Checked by AVG.
> >> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release Date:
> >> 5/20/2008
> >> 6:45 AM
> >>
> >>
> >
> >
>
>
No virus found in this outgoing message.
Checked by AVG.
Version: 8.0.100 / Virus Database: 269.23.21/1458 - Release Date: 5/21/2008 
7:21 AM

Reply via email to