Trunk is a work in progress and the shell was recently redone. This configuration was set tentatively by the author of that change but, as you can see, it doesn't work very well! The jira is here https://issues.apache.org/jira/browse/HBASE-2352
J-D On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz <mkur...@gmail.com> wrote: > Hi! > > I'm using a fresh version of trunk. > I'm experiencing a problem where the invalid region locations are not > removed from the cache of HCM. > I'm only using scanners on the table and I receive the following errors: > > 2010-05-03 23:42:52,574 DEBUG > org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing > internal scanner to startKey at > 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg' > 2010-05-03 23:42:52,574 DEBUG > org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache > hit for row <http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg> > in tableName Test5: location server 10.1.3.111:60020, location region > name > Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136 > SEVERE: Trying to contact region server 10.1.3.111:60020 for region > Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136, > row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg', > but failed after 1 attempts. > Exceptions: > java.net.ConnectException: Connection refused > > Which is expected as the 10.1.3.111:60020 regionserver was offline for > hours at that time. > The cause of this problem is that I set hbase.client.retries.number to > 1 as I don't like the current retry options. > In this case the following code at HConnectionManager.java:1061 > callable.instantiateServer(tries != 0); > will make scanners to always use the cache. > This makes hbase.client.retries.number = 1 an unusable option. > > This is not intentional, am I correct? > Am I forced to use the retries, or is there an other option? > > Also I would like to ask, when is it a good thing to retry an operation? > In my experience there exists two kinds of failures > 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline > This can be due to a compaction, in which case we probably need to > wait for a few seconds. > Or it can be due to a split, in which case we might need to wait for minutes. > Either case I would not want my client to wait for such long times > when I could reschedule other things to do in that time. > It is also possible that region has been transfered to an other > regionserver but that is rare compared to the other cases. > > 2) java.net.ConnectException : regionserver is offline > This is solved as soon as the master can reopen regions on an other > regionserver, but still can take minutes. > Anyway this exception is also rare(usually) > > Best regards, > Miklos >