Yeah I understand that retries are unusable at that level, but you still want retries in order to be able to recalibrate the .META. cache right?
So the semantic here is that 1 retry is in fact 1 try, using the cached information. https://issues.apache.org/jira/browse/HBASE-2445 is about reviewing those semantics in order to offer something more tangible to the users rather than a mix of number of retries and timeouts. Feel free to take a look and even a stab at this issue ;) J-D On Mon, May 3, 2010 at 3:25 PM, Miklós Kurucz <mkur...@gmail.com> wrote: > This problem is not related to the shell. > I checked 0.20.3 has the same code HConnectionManager.java:1034, I > expect that to be broken too. > > Miklos > > 2010/5/4 Jean-Daniel Cryans <jdcry...@apache.org>: >> Trunk is a work in progress and the shell was recently redone. This >> configuration was set tentatively by the author of that change but, as >> you can see, it doesn't work very well! The jira is here >> https://issues.apache.org/jira/browse/HBASE-2352 >> >> J-D >> >> On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz <mkur...@gmail.com> wrote: >>> Hi! >>> >>> I'm using a fresh version of trunk. >>> I'm experiencing a problem where the invalid region locations are not >>> removed from the cache of HCM. >>> I'm only using scanners on the table and I receive the following errors: >>> >>> 2010-05-03 23:42:52,574 DEBUG >>> org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing >>> internal scanner to startKey at >>> 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg' >>> 2010-05-03 23:42:52,574 DEBUG >>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache >>> hit for row <http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg> >>> in tableName Test5: location server 10.1.3.111:60020, location region >>> name >>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136 >>> SEVERE: Trying to contact region server 10.1.3.111:60020 for region >>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136, >>> row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg', >>> but failed after 1 attempts. >>> Exceptions: >>> java.net.ConnectException: Connection refused >>> >>> Which is expected as the 10.1.3.111:60020 regionserver was offline for >>> hours at that time. >>> The cause of this problem is that I set hbase.client.retries.number to >>> 1 as I don't like the current retry options. >>> In this case the following code at HConnectionManager.java:1061 >>> callable.instantiateServer(tries != 0); >>> will make scanners to always use the cache. >>> This makes hbase.client.retries.number = 1 an unusable option. >>> >>> This is not intentional, am I correct? >>> Am I forced to use the retries, or is there an other option? >>> >>> Also I would like to ask, when is it a good thing to retry an operation? >>> In my experience there exists two kinds of failures >>> 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline >>> This can be due to a compaction, in which case we probably need to >>> wait for a few seconds. >>> Or it can be due to a split, in which case we might need to wait for >>> minutes. >>> Either case I would not want my client to wait for such long times >>> when I could reschedule other things to do in that time. >>> It is also possible that region has been transfered to an other >>> regionserver but that is rare compared to the other cases. >>> >>> 2) java.net.ConnectException : regionserver is offline >>> This is solved as soon as the master can reopen regions on an other >>> regionserver, but still can take minutes. >>> Anyway this exception is also rare(usually) >>> >>> Best regards, >>> Miklos >>> >> >