Re: Regionserver fails to serve region

stack Fri, 17 Oct 2008 11:19:25 -0700

First, see the Jon Gray response. His postulate that the root of yourissues are machines swapping seems likely to me.

See below for some particular answers to your queries (thanks for thedetail).


Jean-Adrien wrote:

The attempts of above can be:
1.
java.io.IOException: java.io.IOException: Premeture EOF from inputStream
        at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)

Did you say your disks had filled? If so, this is likely cause of above(but on our cluster here, we've also been seeing the above and arelooking at HADOOP-3831)

2-10
java.io.IOException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)

Was there more stacktrace on this error? May I see it? Above shouldnever happen (smile).

...

Another 10 attempts scenario I have seen:
1-10:
IPC Server handler 3 on 60020, call getRow([EMAIL PROTECTED], [EMAIL 
PROTECTED], null,
1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
Cannot open filename
/hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
java.io.IOException: Cannot open filename
/hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
        at
org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)

Preceded, in concerned regionsserver log, by the line:

2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not

obtain block blk_-3759213227484579481_226277 from any node:java.io.IOException: No live nodes contain current block

hdfs is hosed; it lost a block from the named file. If hdfs is hosed,so is hbase.

If I look for this block in the hadoop master log I can find

2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask
192.168.1.13:50010 to delete  [...] blk_-3759213227484579481_226277 [...]
(many more blocks)

This is interesting. I wonder why hdfs is deleting a block thatsubsequently a regionserver is trying to use? Can you correlate theblocks' story with hbase actions? (Thats probably an unfair question toask since it would require deep detective work on hbase logs trying totrace the file whose block is missing and its hosting region as it movedaround the cluster).

about 16 min before.
In both cases the regionserver fails to serve the concerned region until I
restart hbase (not hadoop).

Not hadoop?  And if you ran an fsck on the filesystem, its healthy?

One last question by the way:
Why the replication factor of my hbase files in dfs is 3, when my hadoop
cluster is configured to keep only 2 copies ?

See http://wiki.apache.org/hadoop/Hbase/FAQ#12.

Is it because the default (hadoop-default.xml) config file of the hadoop
client, which is embedded in hbase distrib overrides the cluster
configuration for the mapfiles created ?

Yes.

Thanks for the questions J-A.
St.Ack

Re: Regionserver fails to serve region

Reply via email to