[ 
https://issues.apache.org/jira/browse/HBASE-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621307#comment-13621307
 ] 

Nicolas Liochon commented on HBASE-7659:
----------------------------------------

Each time I look at this JIRA, I think it's a good idea, but I can't see all 
the implications.

So here are some randoms thought:
1) I think we can have two settings: a retry.count.min and a timeout. This way:
 - people who wants the system to behave as it used to can set the retry min to 
the current retry limit and the timeout to +inf 
 - people who fancies new stuff can use the timeout
 - the recommended setting would be at least 2 or 3 retries

2) Such scenarios should still be supported imho:
 - client calls server S1. S1 is dead, so client waits until the socket timeout 
is raised (1 minute by default, sometimes more)
 - actually, while the client was waiting, the regions on S1 were moved to S2.
 - client goes to .meta., .meta. says go to S2. Client is happy (while a single 
1 minute timeout would have made it failed as it would not have done the second 
try)
 
My fear here is the subcases of this one, when there is a major crash: may be 
.meta. moved, so the second try may fail while the third would do it.

3) If we use a "Future" approach on top of the existing code, we may may leave 
running attempts in the system (i.e. we're still trying and holding resources, 
but nobody cares of the final result).

4) When the system go crazy, it takes ~15 minutes to recover (it could be even 
more). The client code may not like it, but there is nothing much we can do. 
Typically, I think we could have two settings:
 - insert it whatever the time it takes (critical data)
 - insert it if possible (lower value data like 'i like' increment).
For this kind of functional segregation, the retry count could be enough: with 
20 retries for the first category and 3 retries for the second one?
                
> add an option for timeout, rather than retry limit, in HCM
> ----------------------------------------------------------
>
>                 Key: HBASE-7659
>                 URL: https://issues.apache.org/jira/browse/HBASE-7659
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> Retry count limit is not the most useful user-facing measure for failing the 
> request, especially multi-requests that currently fail if any one sub-action 
> reaches the retry count. 
> Given the current deterministic implementation of retry count limit and 
> deterministic (+-jitter) sleep time between retries, the user is already 
> giving us an upper time bound for an operation to expire (with default 10 
> retries, around a minute).
> We can make this explicit.
> That will also make making retries smarter (e.g. retrying faster on certain 
> errors) more easy.
> In future these things can be set per request, which can be usable for people 
> using HBase directly from their code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to