Re: hbase.client.retries.number = 1 is bad

Miklós Kurucz Mon, 03 May 2010 15:25:30 -0700

This problem is not related to the shell.
I checked 0.20.3 has the same code HConnectionManager.java:1034, I
expect that to be broken too.


Miklos

2010/5/4 Jean-Daniel Cryans <jdcry...@apache.org>:
> Trunk is a work in progress and the shell was recently redone. This
> configuration was set tentatively by the author of that change but, as
> you can see, it doesn't work very well! The jira is here
> https://issues.apache.org/jira/browse/HBASE-2352
>
> J-D
>
> On Mon, May 3, 2010 at 3:12 PM, Miklós Kurucz <mkur...@gmail.com> wrote:
>> Hi!
>>
>> I'm using a fresh version of trunk.
>> I'm experiencing a problem where the invalid region locations are not
>> removed from the cache of HCM.
>> I'm only using scanners on the table and I receive the following errors:
>>
>> 2010-05-03 23:42:52,574 DEBUG
>> org.apache.hadoop.hbase.client.HTable$ClientScanner: Advancing
>> internal scanner to startKey at
>> 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg'
>> 2010-05-03 23:42:52,574 DEBUG
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: Cache
>> hit for row <http://hu.gaabi.www/jordania/(041022)_jord-155_petra.jpg>
>> in tableName Test5: location server 10.1.3.111:60020, location region
>> name 
>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136
>> SEVERE: Trying to contact region server 10.1.3.111:60020 for region
>> Test5,http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg,1272896369136,
>> row 'http://hu.gaabi.www/jordania/\x28041022\x29_jord-155_petra.jpg',
>> but failed after 1 attempts.
>> Exceptions:
>> java.net.ConnectException: Connection refused
>>
>> Which is expected as the 10.1.3.111:60020 regionserver was offline for
>> hours at that time.
>> The cause of this problem is that I set hbase.client.retries.number to
>> 1 as I don't like the current retry options.
>> In this case the following code at HConnectionManager.java:1061
>>   callable.instantiateServer(tries != 0);
>> will make scanners to always use the cache.
>> This makes hbase.client.retries.number = 1 an unusable option.
>>
>> This is not intentional, am I correct?
>> Am I forced to use the retries, or is there an other option?
>>
>> Also I would like to ask, when is it a good thing to retry an operation?
>> In my experience there exists two kinds of failures
>> 1) org.apache.hadoop.hbase.NotServingRegionException : region is offline
>> This can be due to a compaction, in which case we probably need to
>> wait for a few seconds.
>> Or it can be due to a split, in which case we might need to wait for minutes.
>> Either case I would not want my client to wait for such long times
>> when I could reschedule other things to do in that time.
>> It is also possible that region has been transfered to an other
>> regionserver but that is rare compared to the other cases.
>>
>> 2) java.net.ConnectException : regionserver is offline
>> This is solved as soon as the master can reopen regions on an other
>> regionserver, but still can take minutes.
>> Anyway this exception is also rare(usually)
>>
>> Best regards,
>> Miklos
>>
>

Re: hbase.client.retries.number = 1 is bad

Reply via email to