Yeah I understand that retries are unusable at that level, but you
still want retries in order to be able to recalibrate the .META. cache
right?

So the semantic here is that 1 retry is in fact 1 try, using the
cached information. https://issues.apache.org/jira/browse/HBASE-2445
is about reviewing those semantics in order to offer something more
tangible to the users rather than a mix of number of retries and
timeouts. Feel free to take a look and even a stab at this issue ;)

J-D

On Mon, May 3, 2010 at 3:25 PM, Miklós Kurucz <mkur...@gmail.com> wrote:
> This problem is not related to the shell.
> I checked 0.20.3 has the same code HConnectionManager.java:1034, I
> expect that to be broken too.
>
> Miklos
>
> 2010/5/4 Jean-Daniel Cryans <jdcry...@apache.org>:
>> Trunk is a work in progress and the shell was recently redone. This
>> configuration was set tentatively by the author of that change but, as
>> you can see, it doesn't work very well! The jira is here
>> https://issues.apache.org/jira/browse/HBASE-2352
>>
>> J-D
>>
>> On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz <mkur...@gmail.com> wrote:
>>> Hi!
>>>
>>> I'm using a fresh version of trunk.
>>> I'm experiencing a problem where the invalid region locations are not
>>> removed from the cache of HCM.
>>> I'm only using scanners on the table and I receive the following errors:
>>>
>>> 2010-05-03 23:42:52,574 DEBUG
>>> org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing
>>> internal scanner to startKey at
>>> 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg'
>>> 2010-05-03 23:42:52,574 DEBUG
>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache
>>> hit for row <http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg>
>>> in tableName Test5: location server 10.1.3.111:60020, location region
>>> name 
>>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136
>>> SEVERE: Trying to contact region server 10.1.3.111:60020 for region
>>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136,
>>> row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg',
>>> but failed after 1 attempts.
>>> Exceptions:
>>> java.net.ConnectException: Connection refused
>>>
>>> Which is expected as the 10.1.3.111:60020 regionserver was offline for
>>> hours at that time.
>>> The cause of this problem is that I set hbase.client.retries.number to
>>> 1 as I don't like the current retry options.
>>> In this case the following code at HConnectionManager.java:1061
>>>   callable.instantiateServer(tries != 0);
>>> will make scanners to always use the cache.
>>> This makes hbase.client.retries.number = 1 an unusable option.
>>>
>>> This is not intentional, am I correct?
>>> Am I forced to use the retries, or is there an other option?
>>>
>>> Also I would like to ask, when is it a good thing to retry an operation?
>>> In my experience there exists two kinds of failures
>>> 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline
>>> This can be due to a compaction, in which case we probably need to
>>> wait for a few seconds.
>>> Or it can be due to a split, in which case we might need to wait for 
>>> minutes.
>>> Either case I would not want my client to wait for such long times
>>> when I could reschedule other things to do in that time.
>>> It is also possible that region has been transfered to an other
>>> regionserver but that is rare compared to the other cases.
>>>
>>> 2) java.net.ConnectException : regionserver is offline
>>> This is solved as soon as the master can reopen regions on an other
>>> regionserver, but still can take minutes.
>>> Anyway this exception is also rare(usually)
>>>
>>> Best regards,
>>> Miklos
>>>
>>
>

Reply via email to