Thanks for the response, Jim.

I've tried this and I'm getting the same behavior. Anything else I should be
looking for?

I also found empty files in the compaction directory which I deleted - hope
that was ok.


On Wed, May 21, 2008 at 9:29 AM, Jim Kellerman <[EMAIL PROTECTED]> wrote:

> What is happening is that the region server is trying to open the meta, and
> failing because there is an empty file somewhere in the region or in the
> recovery log. I would advise the following:
>
> 1. shut down HBase (as cleanly as possible)
> 2. find and delete any zero length files
> 3. there may be empty, but not zero length, MapFiles. MapFiles are
>   stored in a directory that looks like:
>   <hbase-root>/<tablename>/<a number>/<columnname>/mapfiles/<a long
> number>/
>
>   In this directory there are two files: 'data' and 'index'. An empty
>   MapFile (one that has been opened and closed with no data written to
>   it) 'data' will have size 110 (bytes) and 'info' will have size 137.
>
>   If you find one of these, you should delete the directory that contains
>   them (in the example above <a long number>).
>
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
>
>
> > -----Original Message-----
> > From: Daniel Leffel [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, May 21, 2008 8:13 AM
> > To: [email protected]
> > Subject: Re: Problem caused by HBASE-617 - Blocks missing
> > from hlog.dat
> >
> > After letting the hbase attempt to assign regions all night
> > long, I awoke to an unassigned meta region. Here is the master log:
> >
> > Desc: {name: .META., families: {info:={name: info, max versions: 1,
> > compression: NONE, in memory: false, max length: 2147483647,
> > bloom filter:
> > none}}}
> > 2008-05-21 11:10:29,560 DEBUG
> > org.apache.hadoop.hbase.HMaster: Main processing loop:
> > ProcessRegionClose of .META.,,1, true
> > 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
> > region closed:
> > .META.,,1
> > 2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: reassign
> > region: .META.,,1
> > 2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster:
> > assigning region .META.,,1 to server 10.252.242.159:60020
> > 2008-05-21 11:10:32,740 DEBUG
> > org.apache.hadoop.hbase.HMaster: Received
> > MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.252.242.159:60020
> > 2008-05-21 11:10:32,740 DEBUG
> > org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> > .META.,,1 from 10.252.242.159:60020
> > 2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster:
> > 10.252.242.159:60020 no longer serving regionname: .META.,,1,
> > startKey: <>,
> > endKey: <>, encodedName: 1028785192, tableDesc: {name:
> > .META., families:
> > {info:={name: info, max versions: 1, compression: NONE, in
> > memory: false, max length: 2147483647, bloom filter: none}}}
> > 2008-05-21 11:10:32,740 DEBUG
> > org.apache.hadoop.hbase.HMaster: Main processing loop:
> > ProcessRegionClose of .META.,,1, true
> > 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
> > region closed:
> > .META.,,1
> > 2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: reassign
> > region: .META.,,1
> > 2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster:
> > assigning region .META.,,1 to server 10.254.30.79:60020
> > 2008-05-21 11:10:37,595 DEBUG
> > org.apache.hadoop.hbase.HMaster: Received
> > MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.254.30.79:60020
> > 2008-05-21 11:10:37,596 DEBUG
> > org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
> > .META.,,1 from 10.254.30.79:60020
> > 2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster:
> > 10.254.30.79:60020 no longer serving regionname: .META.,,1,
> > startKey: <>,
> > endKey: <>, encodedName: 1028785192, tableDesc: {name:
> > .META., families:
> > {info:={name: info, max versions: 1, compression: NONE, in
> > memory: false, max length: 2147483647, bloom filter: none}}}
> > 2008-05-21 11:10:37,596 DEBUG
> > org.apache.hadoop.hbase.HMaster: Main processing loop:
> > ProcessRegionClose of .META.,,1, true
> >
> >
> > and here is the regin server side:
> > 2008-05-21 11:12:38,027 INFO org.apache.hadoop.hbase.HRegionServer:
> > MSG_REGION_OPEN : .META.,,1
> > 2008-05-21 11:12:38,027 DEBUG
> > org.apache.hadoop.hbase.HRegion: Opening region .META.,,1/1028785192
> > 2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore:
> > loaded
> > /hbase/.META./1028785192/info/info/2509658022189995817,
> > isReference=false
> > 2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore:
> > loaded
> > /hbase/.META./1028785192/info/info/8183182393002383771,
> > isReference=false
> > 2008-05-21 11:12:38,049 DEBUG org.apache.hadoop.hbase.HStore: Loaded 2
> > file(s) in hstore 1028785192/info, max sequence id 917793587
> > 2008-05-21 11:12:38,116 ERROR
> > org.apache.hadoop.hbase.HRegionServer: error opening region
> > .META.,,1 java.io.EOFException
> >         at java.io.DataInputStream.readFully(DataInputStream.java:180)
> >         at java.io.DataInputStream.readFully(DataInputStream.java:152)
> >         at
> > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
> >         at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> > ava:1411)
> >         at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> > ava:1400)
> >         at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
> > ava:1395)
> >         at
> > org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
> >         at
> > org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
> >         at
> > org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i
> nit>(HStoreFile.java:554)
> >         at
> > org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<
> init>(HStoreFile.java:609)
> >         at
> > org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
> >         at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
> >         at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
> >         at
> > org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer
> > .java:1258)
> >         at
> > org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer
> > .java:1204)
> >         at java.lang.Thread.run(Thread.java:619)
> >
> >
> >
> > On Tue, May 20, 2008 at 4:19 PM, stack <[EMAIL PROTECTED]> wrote:
> >
> > > Daniel Leffel wrote:
> > >
> > >> After experiencing a region server that would not exit
> > (HBASE-617), I
> > >> tried to bring back up hbase (after first having shut down and
> > >> bringing back up DFS).
> > >>
> > >> There are around 370 regions. The first 250 were assigned
> > to region
> > >> servers within 5 minutes of startup. The rest of the
> > regions took the
> > >> better part of the day to become assigned to a region
> > server. A quick
> > >> inspection of the regionserver logs were showing messages like the
> > >> following:
> > >>
> > >> 2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster:
> > >> Received MSG_REPORT_PROCESS_OPEN :
> > categories,2864153,1211005494348
> > >> from 10.254.26.31:60020
> > >>
> > >>
> > >
> > > These messages are sent over to the master by the regionserver as a
> > > kind of ping saying "I'm still alive and working on
> > whatever it was you gave me".
> > >
> > > Can you tell what was happening by looking in regionserver logs?
> > >
> > > Was it that all regions had been given to a single
> > regionserver and it
> > > was busy replaying edits before bringing the regions online
> > (There is
> > > a single worker thread per regionserver.  If lots of edits
> > to replay,
> > > can take seconds to minutes to bring on a region).
> > >
> > > Did the regions come online gradually or all in a lump?
> > >
> > >> After waiting for all the regions to be assigned (and an
> > absence of
> > >> the above message appearing in the log), I started a MapReduce job
> > >> that iterates over all regions. Immediately, the above mentioned
> > >> region began to show up in the logs again with the above
> > message and
> > >> the job failed with an IOException because it couldn't
> > locate blocks.
> > >>
> > >> I ran fsck on /hbase and sure enough, blocks are missing from the
> > >> following file (although it reports a size of 0 as what's
> > missing - I
> > >> presume it just doesn't know):
> > >>
> > >> /hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000
> > >>
> > >>
> > > The above looks like the innocuous messages described in
> > > https://issues.apache.org/jira/browse/HBASE-509.
> > >
> > > St.Ack
> > >
> > >
> >
> > No virus found in this incoming message.
> > Checked by AVG.
> > Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release
> > Date: 5/20/2008 6:45 AM
> >
> No virus found in this outgoing message.
> Checked by AVG.
> Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release Date: 5/20/2008
> 6:45 AM
>

Reply via email to