[jira] [Commented] (HBASE-18390) Sleep too long when finding region location failed

Chia-Ping Tsai (JIRA) Tue, 18 Jul 2017 06:38:31 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16091554#comment-16091554
 ]


Chia-Ping Tsai commented on HBASE-18390:
----------------------------------------

{noformat}
There is an interesting side effect: the client is informed immediately that 
the regionserver died, so immediately goes to .meta. As the recovery is not 
done, .meta. contains the same (dead) location, so the client fails again and 
comes back immediately to .meta. => We're hammering .meta. now. The easy fix is 
to add a ~10s sleep on the client. A possibly better fix from a mttr point of 
view would be to have the master sending messages to say that a server recovery 
is finished. I will go for the former first.
{noformat}
What do you think about the comment from HBASE-7590? Does the side effect come 
back after this patch is merged? If no, +1 from me.

> Sleep too long when finding region location failed
> --------------------------------------------------
>
>                 Key: HBASE-18390
>                 URL: https://issues.apache.org/jira/browse/HBASE-18390
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.3.1, 1.2.6, 1.1.11, 2.0.0-alpha-1
>            Reporter: Phil Yang
>            Assignee: Phil Yang
>             Fix For: 3.0.0, 1.4.0, 1.3.2, 1.2.7, 2.0.0-alpha-2, 1.1.12
>
>         Attachments: HBASE-18390.v01.patch, HBASE-18390.v02.patch, 
> HBASE-18390.v03.patch
>
>
> If RegionServerCallable#prepare failed when getRegionLocation, the location 
> in this callable object is null. And before we retry we will sleep. However, 
> when location is null we will sleep at least 10 seconds. And the request will 
> be failed directly if operation timeout is less than 10 seconds. I think it 
> is no need to keep MIN_WAIT_DEAD_SERVER logic. Use backoff sleeping logic is 
> ok for most cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-18390) Sleep too long when finding region location failed

Reply via email to