[ https://issues.apache.org/jira/browse/HBASE-16388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425911#comment-15425911 ]
Duo Zhang commented on HBASE-16388: ----------------------------------- How do you deal with the async call? Although it is not used right now and I plan to reimplement it... > Prevent client threads being blocked by only one slow region server > ------------------------------------------------------------------- > > Key: HBASE-16388 > URL: https://issues.apache.org/jira/browse/HBASE-16388 > Project: HBase > Issue Type: New Feature > Reporter: Phil Yang > Assignee: Phil Yang > Attachments: HBASE-16388-v1.patch, HBASE-16388-v2.patch, > HBASE-16388-v2.patch > > > It is a general use case for HBase's users that they have several > threads/handlers in their service, and each handler has its own Table/HTable > instance. Generally users think each handler is independent and won't > interact each other. > However, in an extreme case, if a region server is very slow, every requests > to this RS will timeout, handlers of users' service may be occupied by the > long-waiting requests even requests belong to other RS will also be timeout. > For example: > If we have 100 handlers in a client service(timeout is 1000ms) and HBase has > 10 region servers whose average response time is 50ms. If no region server is > slow, we can handle 2000 requests per second. > Now this service's QPS is 1000. If there is one region server very slow and > all requests to it will be timeout. Users hope that only 10% requests failed, > and 90% requests' response time is still 50ms, because only 10% requests are > located to the slow RS. However, each second we have 100 long-waiting > requests which exactly occupies all 100 handles. So all handlers is blocked, > the availability of this service is almost zero. > To prevent this case, we can limit the max concurrent requests to one RS in > process-level. Requests exceeding the limit will throws > ServerBusyException(extends DoNotRetryIOE) immediately to users. In the above > case, if we set this limit to 20, only 20 handlers will be occupied and other > 80 handlers can still handle requests to other RS. The availability of this > service is 90% as expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)