What does a listing of /hbase/ .META./1028785192/info/info/ show Daniel? You are still getting EOFException trying to open .META. region? (Its kinda dumb that HDFS doesn't say which file the EOF is coming out of).

St.Ack




Daniel Leffel wrote:
Thanks for the response, Jim.

I've tried this and I'm getting the same behavior. Anything else I should be
looking for?

I also found empty files in the compaction directory which I deleted - hope
that was ok.


On Wed, May 21, 2008 at 9:29 AM, Jim Kellerman <[EMAIL PROTECTED]> wrote:

What is happening is that the region server is trying to open the meta, and
failing because there is an empty file somewhere in the region or in the
recovery log. I would advise the following:

1. shut down HBase (as cleanly as possible)
2. find and delete any zero length files
3. there may be empty, but not zero length, MapFiles. MapFiles are
  stored in a directory that looks like:
  <hbase-root>/<tablename>/<a number>/<columnname>/mapfiles/<a long
number>/

  In this directory there are two files: 'data' and 'index'. An empty
  MapFile (one that has been opened and closed with no data written to
  it) 'data' will have size 110 (bytes) and 'info' will have size 137.

  If you find one of these, you should delete the directory that contains
  them (in the example above <a long number>).


---
Jim Kellerman, Senior Engineer; Powerset


-----Original Message-----
From: Daniel Leffel [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 21, 2008 8:13 AM
To: [email protected]
Subject: Re: Problem caused by HBASE-617 - Blocks missing
from hlog.dat

After letting the hbase attempt to assign regions all night
long, I awoke to an unassigned meta region. Here is the master log:

Desc: {name: .META., families: {info:={name: info, max versions: 1,
compression: NONE, in memory: false, max length: 2147483647,
bloom filter:
none}}}
2008-05-21 11:10:29,560 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true
2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
region closed:
.META.,,1
2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: reassign
region: .META.,,1
2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster:
assigning region .META.,,1 to server 10.252.242.159:60020
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Received
MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.252.242.159:60020
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
.META.,,1 from 10.252.242.159:60020
2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster:
10.252.242.159:60020 no longer serving regionname: .META.,,1,
startKey: <>,
endKey: <>, encodedName: 1028785192, tableDesc: {name:
.META., families:
{info:={name: info, max versions: 1, compression: NONE, in
memory: false, max length: 2147483647, bloom filter: none}}}
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true
2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
region closed:
.META.,,1
2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: reassign
region: .META.,,1
2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster:
assigning region .META.,,1 to server 10.254.30.79:60020
2008-05-21 11:10:37,595 DEBUG
org.apache.hadoop.hbase.HMaster: Received
MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.254.30.79:60020
2008-05-21 11:10:37,596 DEBUG
org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
.META.,,1 from 10.254.30.79:60020
2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster:
10.254.30.79:60020 no longer serving regionname: .META.,,1,
startKey: <>,
endKey: <>, encodedName: 1028785192, tableDesc: {name:
.META., families:
{info:={name: info, max versions: 1, compression: NONE, in
memory: false, max length: 2147483647, bloom filter: none}}}
2008-05-21 11:10:37,596 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true


and here is the regin server side:
2008-05-21 11:12:38,027 INFO org.apache.hadoop.hbase.HRegionServer:
MSG_REGION_OPEN : .META.,,1
2008-05-21 11:12:38,027 DEBUG
org.apache.hadoop.hbase.HRegion: Opening region .META.,,1/1028785192
2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore:
loaded
/hbase/.META./1028785192/info/info/2509658022189995817,
isReference=false
2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore:
loaded
/hbase/.META./1028785192/info/info/8183182393002383771,
isReference=false
2008-05-21 11:12:38,049 DEBUG org.apache.hadoop.hbase.HStore: Loaded 2
file(s) in hstore 1028785192/info, max sequence id 917793587
2008-05-21 11:12:38,116 ERROR
org.apache.hadoop.hbase.HRegionServer: error opening region
.META.,,1 java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1411)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1400)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1395)
        at
org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
        at
org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
        at
org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i
nit>(HStoreFile.java:554)
        at
org.apache.hadoop.hbase.HStoreFile$BloomFilterMapFile$Reader.<
init>(HStoreFile.java:609)
        at
org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
        at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
        at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
        at
org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer
.java:1258)
        at
org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer
.java:1204)
        at java.lang.Thread.run(Thread.java:619)



On Tue, May 20, 2008 at 4:19 PM, stack <[EMAIL PROTECTED]> wrote:

Daniel Leffel wrote:

After experiencing a region server that would not exit
(HBASE-617), I
tried to bring back up hbase (after first having shut down and
bringing back up DFS).

There are around 370 regions. The first 250 were assigned
to region
servers within 5 minutes of startup. The rest of the
regions took the
better part of the day to become assigned to a region
server. A quick
inspection of the regionserver logs were showing messages like the
following:

2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster:
Received MSG_REPORT_PROCESS_OPEN :
categories,2864153,1211005494348
from 10.254.26.31:60020


These messages are sent over to the master by the regionserver as a
kind of ping saying "I'm still alive and working on
whatever it was you gave me".
Can you tell what was happening by looking in regionserver logs?

Was it that all regions had been given to a single
regionserver and it
was busy replaying edits before bringing the regions online
(There is
a single worker thread per regionserver.  If lots of edits
to replay,
can take seconds to minutes to bring on a region).

Did the regions come online gradually or all in a lump?

After waiting for all the regions to be assigned (and an
absence of
the above message appearing in the log), I started a MapReduce job
that iterates over all regions. Immediately, the above mentioned
region began to show up in the logs again with the above
message and
the job failed with an IOException because it couldn't
locate blocks.
I ran fsck on /hbase and sure enough, blocks are missing from the
following file (although it reports a size of 0 as what's
missing - I
presume it just doesn't know):

/hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000


The above looks like the innocuous messages described in
https://issues.apache.org/jira/browse/HBASE-509.

St.Ack


No virus found in this incoming message.
Checked by AVG.
Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release
Date: 5/20/2008 6:45 AM

No virus found in this outgoing message.
Checked by AVG.
Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release Date: 5/20/2008
6:45 AM



Reply via email to