[jira] [Commented] (KUDU-1395) Scanner KeepAlive requests can get starved on an overloaded server

Jean-Daniel Cryans (JIRA) Fri, 08 Apr 2016 09:31:08 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232445#comment-15232445
 ]


Jean-Daniel Cryans commented on KUDU-1395:
------------------------------------------

bq. Well, in the case that the server is overloaded, it's likely not fast - you 
may have to get bounced and retry several times before you get serviced 
(multiple seconds).

Isn't that good enough? We just want to make sure the scanner doesn't die on 
the server-side.

bq. Another option: we could have the KeepAlive request sent with a much 
shorter timeout, rather than using the scanner timeout. This increases its 
priority over other calls. That in combination with the retries would probably 
be a good mix of getting quick response and also getting better likelihood of 
"getting through".

Sounds good, although I hope it won't be too messy to handle in the client.

> Scanner KeepAlive requests can get starved on an overloaded server
> ------------------------------------------------------------------
>
>                 Key: KUDU-1395
>                 URL: https://issues.apache.org/jira/browse/KUDU-1395
>             Project: Kudu
>          Issue Type: Bug
>          Components: impala, rpc, tserver
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first 
> basis, rejecting those with later deadlines. This works well for RPCs which 
> are retried on SERVER_TOO_BUSY errors, since the retries maintain the 
> original deadline and thus get higher and higher priority as they get closer 
> to timing out.
> We don't, however, do any retries on scanner KeepAlive RPCs. So, if a 
> keepalive RPC arrives at a heavily overloaded tserver, it will likely get 
> rejected, and won't retry. This means that Impala queries or other long scans 
> that rely on KeepAlives will likely fail on overloaded clusters since the 
> KeepAlive never gets through.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1395) Scanner KeepAlive requests can get starved on an overloaded server

Reply via email to