Sure I saw, excuse me, I wrote this one one day before I posted it, and your answered my first mail in the meantime. Anyway, I'll fix my cluster setup in order to have a better memory allocation, and ensure there is no swap which overuses IO.
Thanks. Have a nice day Jonathan Gray-8 wrote: > > Jean-Adrien, > > Did you see my reply to your previous email? > > I think your machines are underpowered for your current setup and it's > creating all kinds of problems. If you have swapping going on in a > regionserver/datanode, that must be addressed because it usually leads to > odd behavior in hdfs, timeouts, starvation, etc... > > Decrease your allotted heap sizes to fit within available memory, or add > more memory. > > JG > > -----Original Message----- > From: Jean-Adrien [mailto:[EMAIL PROTECTED] > Sent: Friday, October 17, 2008 1:02 AM > To: [email protected] > Subject: Regionserver fails to serve region > > > Hello again. > This is my last message for today > > I have often an exception in my HBase client. A regionserver fails to > serve > a region when the client get a row on the HBase cluster. > > org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to > contact > region server 192.168.1.15:60020 for region > table-0.3,:testrow79063200,1223872616091, row ':testrow22102600', but > failed > after 10 attempts. > > The attempts of above can be: > 1. > java.io.IOException: java.io.IOException: Premeture EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > 2-10 > java.io.IOException: java.io.IOException: java.lang.NullPointerException > at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354) > > After what. Every time the client try to reach the same region the attemps > 1-10 are > java.io.IOException: java.io.IOException: java.lang.NullPointerException > at org.apache.hadoop.hbase.HStoreKey.compareTo(HStoreKey.java:354) > > In this case, if the client try to reach the same region again, all next > 10 > attemps are the NPE. > > Another 10 attempts scenario I have seen: > 1-10: > IPC Server handler 3 on 60020, call getRow([EMAIL PROTECTED], [EMAIL > PROTECTED], null, > 1224105427910, -1) from 192.168.1.11:55371: error: java.io.IOException: > Cannot open filename > /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data > java.io.IOException: Cannot open filename > /hbase/table-0.3/1739432898/header/mapfiles/4558585535524295446/data > at > org.apache.hadoop.dfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1171) > > Preceded, in concerned regionsserver log, by the line: > > 2008-10-15 23:19:30,461 INFO org.apache.hadoop.dfs.DFSClient: Could not > obtain block blk_-3759213227484579481_226277 from any node: > java.io.IOException: No live nodes contain current block > > If I look for this block in the hadoop master log I can find > > 2008-10-15 23:03:45,276 INFO org.apache.hadoop.dfs.StateChange: BLOCK* ask > 192.168.1.13:50010 to delete [...] blk_-3759213227484579481_226277 [...] > (many more blocks) > > about 16 min before. > In both cases the regionserver fails to serve the concerned region until I > restart hbase (not hadoop). > > I have no clue to know if such a failure is temporary (how long) or I > really > need to restart. But I noticed that the failure doesn't recover in the > next > 3-4 hours. > > One last question by the way: > Why the replication factor of my hbase files in dfs is 3, when my hadoop > cluster is configured to keep only 2 copies ? > Is it because the default (hadoop-default.xml) config file of the hadoop > client, which is embedded in hbase distrib overrides the cluster > configuration for the mapfiles created ? > Is that a good configuration scheme, or is it preferable to allow the > hbase > hadoop client to load the hadoop-site.xml file I have set for the running > instance of hadoop server, adding the hadoop conf directory in the hbase > classpath, > and therefore having the same configuration in client than in server ? > > Have a nice day. > Thank you for your advises. > > -- Jean-Adrien > > Cluster setup: > 4 regionsservers / datanodes > 1 is master / namenode as well. > java-6-sun > Total size of hdfs: 81.98 GB (replication factor 3) > fsck -> healthy > hadoop: 0.18.1 > hbase: 0.18.0 (jar of hadoop replaced with 0.18.1) > 1Gb ram per node > > > > > -- > View this message in context: > http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20028553 > .html > Sent from the HBase User mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/Regionserver-fails-to-serve-region-tp20028553p20065884.html Sent from the HBase User mailing list archive at Nabble.com.
