Todd Lipcon created KUDU-1395:
---------------------------------

             Summary: Scanner KeepAlive requests can get starved on an 
overloaded server
                 Key: KUDU-1395
                 URL: https://issues.apache.org/jira/browse/KUDU-1395
             Project: Kudu
          Issue Type: Bug
          Components: impala, rpc, tserver
    Affects Versions: 0.8.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first basis, 
rejecting those with later deadlines. This works well for RPCs which are 
retried on SERVER_TOO_BUSY errors, since the retries maintain the original 
deadline and thus get higher and higher priority as they get closer to timing 
out.

We don't, however, do any retries on scanner KeepAlive RPCs. So, if a keepalive 
RPC arrives at a heavily overloaded tserver, it will likely get rejected, and 
won't retry. This means that Impala queries or other long scans that rely on 
KeepAlives will likely fail on overloaded clusters since the KeepAlive never 
gets through.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to