stack-3 wrote:
>
> First, see the Jon Gray response. His postulate that the root of your
> issues are machines swapping seems likely to me.
>
>
> See below for some particular answers to your queries (thanks for the
> detail).
>
> Jean-Adrien wrote:
>> The attempts of above can be:
>> 1.
>> java.io.IOException: java.io.IOException: Premeture EOF from inputStream
>> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102)
>>
>
> Did you say your disks had filled? If so, this is likely cause of above
> (but on our cluster here, we've also been seeing the above and are
> looking at HADOOP-3831)
>
>
Yes one is.
stack-3 wrote:
>
>> 2-10
>> java.io.IOException: java.io.IOException: java.lang.NullPointerException
>> at
>> org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
>>
>>
> Was there more stacktrace on this error? May I see it? Above should
> never happen (smile).
>
Sure. Enjoy. Take in account that it's happen after the above Premeture EOF.
2008-10-14 14:23:55,705 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 60020, call getRow([EMAIL PROTECTED], [EMAIL PROTECTED], null,
9223372036854775807, -1) from 192.168.1.10:49676: error:
java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354)
at
org.apache.hadoop.hbase.HStoreKey$HStoreKeyWritableComparator.compare(HStoreKey.java:593)
at
org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:436)
at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558)
at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541)
at
org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761)
at
org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179)
at
org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160)
at
org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036)
at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
stack-3 wrote:
>
>
>> Another 10 attempts scenario I have seen:
>> 1-10:
>> IPC Server handler 3 on 60020, call getRow([EMAIL PROTECTED], [EMAIL
>> PROTECTED], null,
>> 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException:
>> Cannot open filename
>> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
>> java.io.IOException: Cannot open filename
>> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data
>> at
>> org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171)
>>
>> Preceded, in concerned regionsserver log, by the line:
>>
>> 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not
>> obtain block blk_-3759213227484579481_226277 from any node:
>> java.io.IOException: No live nodes contain current block
>>
>>
> hdfs is hosed; it lost a block from the named file. If hdfs is hosed,
> so is hbase.
>
>
>> If I look for this block in the hadoop master log I can find
>>
>> 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
>> ask
>> 192.168.1.13:50010 to delete [...] blk_-3759213227484579481_226277 [...]
>> (many more blocks)
>>
>
> This is interesting. I wonder why hdfs is deleting a block that
> subsequently a regionserver is trying to use? Can you correlate the
> blocks' story with hbase actions? (Thats probably an unfair question to
> ask since it would require deep detective work on hbase logs trying to
> trace the file whose block is missing and its hosting region as it moved
> around the cluster).
>
>
I have noticed no correlation for now. I'll try to play the detective a bit.
If I notice something, I'll post it there.
stack-3 wrote:
>
>
>
>> about 16 min before.
>> In both cases the regionserver fails to serve the concerned region until
>> I
>> restart hbase (not hadoop).
>>
>>
> Not hadoop? And if you ran an fsck on the filesystem, its healthy?
>
>
Not hadoop. Fsck says it's healthly.
stack-3 wrote:
>
>
>> One last question by the way:
>> Why the replication factor of my hbase files in dfs is 3, when my hadoop
>> cluster is configured to keep only 2 copies ?
>>
> See http://wiki.apache.org/hadoop/Hbase/FAQ#12.
>
>> Is it because the default (hadoop-default.xml) config file of the hadoop
>> client, which is embedded in hbase distrib overrides the cluster
>> configuration for the mapfiles created ?
> Yes.
>
> Thanks for the questions J-A.
> St.Ack
>
>
Thank you too.
--
View this message in context:
http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20066104.html
Sent from the HBase User mailing list archive at Nabble.com.