Hi.I have similar problem. My configuration is 8 machines with 4gb ram with default heap size for hbase.
On Mon, Oct 20, 2008 at 11:38 AM, Jean-Adrien <[EMAIL PROTECTED]> wrote: > > > stack-3 wrote: > > > > First, see the Jon Gray response. His postulate that the root of your > > issues are machines swapping seems likely to me. > > > > > > See below for some particular answers to your queries (thanks for the > > detail). > > > > Jean-Adrien wrote: > >> The attempts of above can be: > >> 1. > >> java.io.IOException: java.io.IOException: Premeture EOF from inputStream > >> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > >> > > > > Did you say your disks had filled? If so, this is likely cause of above > > (but on our cluster here, we've also been seeing the above and are > > looking at HADOOP-3831) > > > > > > Yes one is. > > > stack-3 wrote: > > > >> 2-10 > >> java.io.IOException: java.io.IOException: java.lang.NullPointerException > >> at > >> org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354) > >> > >> > > Was there more stacktrace on this error? May I see it? Above should > > never happen (smile). > > > > Sure. Enjoy. Take in account that it's happen after the above Premeture > EOF. > > > 2008-10-14 14:23:55,705 INFO org.apache.hadoop.ipc.Server: IPC Server > handler 7 on 60020, call getRow([EMAIL PROTECTED], [EMAIL PROTECTED], null, > 9223372036854775807, -1) from 192.168.1.10:49676: error: > java.io.IOException: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354) > at > > org.apache.hadoop.hbase.HStoreKey$HStoreKeyWritableComparator.compare(HStoreKey.java:593) > at > org.apache.hadoop.io.MapFile$Reader.seekInternal(MapFile.java:436) > at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:558) > at org.apache.hadoop.io.MapFile$Reader.getClosest(MapFile.java:541) > at > > org.apache.hadoop.hbase.regionserver.HStoreFile$BloomFilterMapFile$Reader.getClosest(HStoreFile.java:761) > at > > org.apache.hadoop.hbase.regionserver.HStore.getFullFromMapFile(HStore.java:1179) > at > org.apache.hadoop.hbase.regionserver.HStore.getFull(HStore.java:1160) > at > org.apache.hadoop.hbase.regionserver.HRegion.getFull(HRegion.java:1221) > at > > org.apache.hadoop.hbase.regionserver.HRegionServer.getRow(HRegionServer.java:1036) > at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java:554) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) > > > stack-3 wrote: > > > > > >> Another 10 attempts scenario I have seen: > >> 1-10: > >> IPC Server handler 3 on 60020, call getRow([EMAIL PROTECTED], [EMAIL > >> PROTECTED], null, > >> 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException: > >> Cannot open filename > >> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data > >> java.io.IOException: Cannot open filename > >> /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data > >> at > >> > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171) > >> > >> Preceded, in concerned regionsserver log, by the line: > >> > >> 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not > >> obtain block blk_-3759213227484579481_226277 from any node: > >> java.io.IOException: No live nodes contain current block > >> > >> > > hdfs is hosed; it lost a block from the named file. If hdfs is hosed, > > so is hbase. > > > > > >> If I look for this block in the hadoop master log I can find > >> > >> 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > >> ask > >> 192.168.1.13:50010 to delete [...] blk_-3759213227484579481_226277 > [...] > >> (many more blocks) > >> > > > > This is interesting. I wonder why hdfs is deleting a block that > > subsequently a regionserver is trying to use? Can you correlate the > > blocks' story with hbase actions? (Thats probably an unfair question to > > ask since it would require deep detective work on hbase logs trying to > > trace the file whose block is missing and its hosting region as it moved > > around the cluster). > > > > > > I have noticed no correlation for now. I'll try to play the detective a > bit. > If I notice something, I'll post it there. > > > stack-3 wrote: > > > > > > > >> about 16 min before. > >> In both cases the regionserver fails to serve the concerned region until > >> I > >> restart hbase (not hadoop). > >> > >> > > Not hadoop? And if you ran an fsck on the filesystem, its healthy? > > > > > > Not hadoop. Fsck says it's healthly. > > > stack-3 wrote: > > > > > >> One last question by the way: > >> Why the replication factor of my hbase files in dfs is 3, when my hadoop > >> cluster is configured to keep only 2 copies ? > >> > > See http://wiki.apache.org/hadoop/Hbase/FAQ#12. > > > >> Is it because the default (hadoop-default.xml) config file of the hadoop > >> client, which is embedded in hbase distrib overrides the cluster > >> configuration for the mapfiles created ? > > Yes. > > > > Thanks for the questions J-A. > > St.Ack > > > > > > Thank you too. > > -- > View this message in context: > http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20066104.html > Sent from the HBase User mailing list archive at Nabble.com. > >
