[ 
https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486102#comment-15486102
 ] 

Phil Yang commented on HBASE-16388:
-----------------------------------

We limit the number of concurrent requests. The suitable conf depends on the 
number of RSs and the number of threads accessing Table's API. But we only know 
the number of RSs, we don't know the number of threads. So this should be set 
by users according to their own environment.

SBE is from client-side, we don't send request at all. So this is a client-only 
fix. After we have AsyncTable and users use it in an async way which means 
user's threads will not be blocked, this conf is still useful to prevent we 
have too many pending request resulting in an OOM. And of course for 
AsyncTable's user, this limit can be much larger than blocking Table's user.

> Prevent client threads being blocked by only one slow region server
> -------------------------------------------------------------------
>
>                 Key: HBASE-16388
>                 URL: https://issues.apache.org/jira/browse/HBASE-16388
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>         Attachments: HBASE-16388-branch-1-v1.patch, HBASE-16388-v1.patch, 
> HBASE-16388-v2.patch, HBASE-16388-v2.patch, HBASE-16388-v2.patch, 
> HBASE-16388-v2.patch
>
>
> It is a general use case for HBase's users that they have several 
> threads/handlers in their service, and each handler has its own Table/HTable 
> instance. Generally users think each handler is independent and won't 
> interact each other.
> However, in an extreme case, if a region server is very slow, every requests 
> to this RS will timeout, handlers of users' service may be occupied by the 
> long-waiting requests even requests belong to other RS will also be timeout.
> For example: 
> If we have 100 handlers in a client service(timeout is 1000ms) and HBase has 
> 10 region servers whose average response time is 50ms. If no region server is 
> slow, we can handle 2000 requests per second.
> Now this service's QPS is 1000. If there is one region server very slow and 
> all requests to it will be timeout. Users hope that only 10% requests failed, 
> and 90% requests' response time is still 50ms, because only 10% requests are 
> located to the slow RS. However, each second we have 100 long-waiting 
> requests which exactly occupies all 100 handles. So all handlers is blocked, 
> the availability of this service is almost zero.
> To prevent this case, we can limit the max concurrent requests to one RS in 
> process-level. Requests exceeding the limit will throws 
> ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above 
> case, if we set this limit to 20, only 20 handlers will be occupied and other 
> 80 handlers can still handle requests to other RS. The availability of this 
> service is 90% as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to