-----Original Message-----
From: Daniel Leffel [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 21, 2008 8:13 AM
To: [email protected]
Subject: Re: Problem caused by HBASE-617 - Blocks missing
from hlog.dat
After letting the hbase attempt to assign regions all night
long, I awoke to an unassigned meta region. Here is the master log:
Desc: {name: .META., families: {info:={name: info, max versions: 1,
compression: NONE, in memory: false, max length: 2147483647,
bloom filter:
none}}}
2008-05-21 11:10:29,560 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true
2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster:
region closed:
.META.,,1
2008-05-21 11:10:29,560 INFO org.apache.hadoop.hbase.HMaster: reassign
region: .META.,,1
2008-05-21 11:10:29,729 INFO org.apache.hadoop.hbase.HMaster:
assigning region .META.,,1 to server 10.252.242.159:60020
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Received
MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.252.242.159:60020
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
.META.,,1 from 10.252.242.159:60020
2008-05-21 11:10:32,740 INFO org.apache.hadoop.hbase.HMaster:
10.252.242.159:60020 no longer serving regionname: .META.,,1,
startKey: <>,
endKey: <>, encodedName: 1028785192, tableDesc: {name:
.META., families:
{info:={name: info, max versions: 1, compression: NONE, in
memory: false, max length: 2147483647, bloom filter: none}}}
2008-05-21 11:10:32,740 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true
2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster:
region closed:
.META.,,1
2008-05-21 11:10:32,741 INFO org.apache.hadoop.hbase.HMaster: reassign
region: .META.,,1
2008-05-21 11:10:34,585 INFO org.apache.hadoop.hbase.HMaster:
assigning region .META.,,1 to server 10.254.30.79:60020
2008-05-21 11:10:37,595 DEBUG
org.apache.hadoop.hbase.HMaster: Received
MSG_REPORT_PROCESS_OPEN : .META.,,1 from 10.254.30.79:60020
2008-05-21 11:10:37,596 DEBUG
org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_CLOSE :
.META.,,1 from 10.254.30.79:60020
2008-05-21 11:10:37,596 INFO org.apache.hadoop.hbase.HMaster:
10.254.30.79:60020 no longer serving regionname: .META.,,1,
startKey: <>,
endKey: <>, encodedName: 1028785192, tableDesc: {name:
.META., families:
{info:={name: info, max versions: 1, compression: NONE, in
memory: false, max length: 2147483647, bloom filter: none}}}
2008-05-21 11:10:37,596 DEBUG
org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessRegionClose of .META.,,1, true
and here is the regin server side:
2008-05-21 11:12:38,027 INFO org.apache.hadoop.hbase.HRegionServer:
MSG_REGION_OPEN : .META.,,1
2008-05-21 11:12:38,027 DEBUG
org.apache.hadoop.hbase.HRegion: Opening region .META.,,1/1028785192
2008-05-21 11:12:38,035 DEBUG org.apache.hadoop.hbase.HStore:
loaded
/hbase/.META./1028785192/info/info/2509658022189995817,
isReference=false
2008-05-21 11:12:38,036 DEBUG org.apache.hadoop.hbase.HStore:
loaded
/hbase/.META./1028785192/info/info/8183182393002383771,
isReference=false
2008-05-21 11:12:38,049 DEBUG org.apache.hadoop.hbase.HStore: Loaded 2
file(s) in hstore 1028785192/info, max sequence id 917793587
2008-05-21 11:12:38,116 ERROR
org.apache.hadoop.hbase.HRegionServer: error opening region
.META.,,1 java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readFully(DataInputStream.java:152)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1434)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1411)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1400)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.j
ava:1395)
at
org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:263)
at
org.apache.hadoop.io.MapFile$Reader.<init>(MapFile.java:242)
at
org.apache.hadoop.hbase.HStoreFile$HbaseMapFile$HbaseReader.<i
at
org.apache.hadoop.hbase.HStoreFile.getReader(HStoreFile.java:382)
at org.apache.hadoop.hbase.HStore.<init>(HStore.java:849)
at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:431)
at
org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer
.java:1258)
at
org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer
.java:1204)
at java.lang.Thread.run(Thread.java:619)
On Tue, May 20, 2008 at 4:19 PM, stack <[EMAIL PROTECTED]> wrote:
Daniel Leffel wrote:
After experiencing a region server that would not exit
(HBASE-617), I
tried to bring back up hbase (after first having shut down and
bringing back up DFS).
There are around 370 regions. The first 250 were assigned
to region
servers within 5 minutes of startup. The rest of the
regions took the
better part of the day to become assigned to a region
server. A quick
inspection of the regionserver logs were showing messages like the
following:
2008-05-20 18:33:46,964 DEBUG org.apache.hadoop.hbase.HMaster:
Received MSG_REPORT_PROCESS_OPEN :
categories,2864153,1211005494348
from 10.254.26.31:60020
These messages are sent over to the master by the regionserver as a
kind of ping saying "I'm still alive and working on
whatever it was you gave me".
Can you tell what was happening by looking in regionserver logs?
Was it that all regions had been given to a single
regionserver and it
was busy replaying edits before bringing the regions online
(There is
a single worker thread per regionserver. If lots of edits
to replay,
can take seconds to minutes to bring on a region).
Did the regions come online gradually or all in a lump?
After waiting for all the regions to be assigned (and an
absence of
the above message appearing in the log), I started a MapReduce job
that iterates over all regions. Immediately, the above mentioned
region began to show up in the logs again with the above
message and
the job failed with an IOException because it couldn't
locate blocks.
I ran fsck on /hbase and sure enough, blocks are missing from the
following file (although it reports a size of 0 as what's
missing - I
presume it just doesn't know):
/hbase/log_10.254.30.79_1211300015031_60020/hlog.dat.000
The above looks like the innocuous messages described in
https://issues.apache.org/jira/browse/HBASE-509.
St.Ack
No virus found in this incoming message.
Checked by AVG.
Version: 8.0.100 / Virus Database: 269.23.21/1456 - Release
Date: 5/20/2008 6:45 AM