Re: Regionserver fails to serve region

Slava Gorelik Mon, 20 Oct 2008 04:17:16 -0700

Hi.I have similar problem.
My configuration is 8 machines with 4gb ram with default heap size for
hbase.



On Mon, Oct 20, 2008 at 11:38 AM, Jean-Adrien <[EMAIL PROTECTED]> wrote:

>
>
> stack-3 wrote:
> >
> > First, see the Jon Gray response.  His postulate that the root of your
> > issues are machines swapping seems likely to me.
> >
> >
> > See below for some particular answers to your queries (thanks for the
> > detail).
> >
> > Jean-Adrien wrote:
> >> The attempts of above can be:
> >> 1.
> >> java.io.IOException: java.io.IOException: Premeture EOF from inputStream
> >>         at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
> >>
> >
> > Did you say your disks had filled?  If so, this is likely cause of above
> > (but on our cluster here, we've also been seeing the above and are
> > looking at HADOOP-3831)
> >
> >
>
> Yes one is.
>
>
> stack-3 wrote:
> >
> >> 2-10
> >> java.io.IOException: java.io.IOException: java.lang.NullPointerException
> >>         at
> >> org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
> >>
> >>
> > Was there more stacktrace on this error?  May I see it?  Above should
> > never happen (smile).
> >
>
> Sure. Enjoy. Take in account that it's happen after the above Premeture
> EOF.
>
>
> 2008-10-14 14:23:55,705 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 60020, call getRow([EMAIL PROTECTED], [EMAIL PROTECTED], null,
> 9223372036854775807, -1) from 192.168.1.10:49676: error:
> java.io.IOException: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
>        at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
>        at
>
> org.apache.hadoop.hbase.HStoreKey$HStoreKeyWritableComparator.compare(HStoreKey.java:593)
>        at
> org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:436)
>        at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
>        at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
>        at
>
> org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
>        at
> org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
>        at
> org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
>        at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
>        at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at
> org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>
>
> stack-3 wrote:
> >
> >
> >> Another 10 attempts scenario I have seen:
> >> 1-10:
> >> IPC Server handler 3 on 60020, call getRow([EMAIL PROTECTED], [EMAIL 
> >> PROTECTED], null,
> >> 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
> >> Cannot open filename
> >> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
> >> java.io.IOException: Cannot open filename
> >> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
> >>         at
> >>
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)
> >>
> >> Preceded, in concerned regionsserver log, by the line:
> >>
> >> 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
> >> obtain block blk_-3759213227484579481_226277 from any node:
> >> java.io.IOException: No live nodes contain current block
> >>
> >>
> > hdfs is hosed; it lost a block from the named file.  If hdfs is hosed,
> > so is hbase.
> >
> >
> >> If I look for this block in the hadoop master log I can find
> >>
> >> 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> >> ask
> >> 192.168.1.13:50010 to delete  [...] blk_-3759213227484579481_226277
> [...]
> >> (many more blocks)
> >>
> >
> > This is interesting.  I wonder why hdfs is deleting a block that
> > subsequently a regionserver is trying to use?   Can you correlate the
> > blocks' story with hbase actions?  (Thats probably an unfair question to
> > ask since it would require deep detective work on hbase logs trying to
> > trace the file whose block is missing and its hosting region as it moved
> > around the cluster).
> >
> >
>
> I have noticed no correlation for now. I'll try to play the detective a
> bit.
> If I notice something, I'll post it there.
>
>
> stack-3 wrote:
> >
> >
> >
> >> about 16 min before.
> >> In both cases the regionserver fails to serve the concerned region until
> >> I
> >> restart hbase (not hadoop).
> >>
> >>
> > Not hadoop?  And if you ran an fsck on the filesystem, its healthy?
> >
> >
>
> Not hadoop. Fsck says it's healthly.
>
>
> stack-3 wrote:
> >
> >
> >> One last question by the way:
> >> Why the replication factor of my hbase files in dfs is 3, when my hadoop
> >> cluster is configured to keep only 2 copies ?
> >>
> > See http://wiki.apache.org/hadoop/Hbase/FAQ#12.
> >
> >> Is it because the default (hadoop-default.xml) config file of the hadoop
> >> client, which is embedded in hbase distrib overrides the cluster
> >> configuration for the mapfiles created ?
> > Yes.
> >
> > Thanks for the questions J-A.
> > St.Ack
> >
> >
>
> Thank you too.
>
> --
> View this message in context:
> http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20066104.html
> Sent from the HBase User mailing list archive at Nabble.com.
>
>

Re: Regionserver fails to serve region

Reply via email to