[jira] [Commented] (KUDU-1409) Make krpc call timeouts more resistant to process pauses

Todd Lipcon (JIRA) Sun, 10 Apr 2016 19:52:59 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15234436#comment-15234436
 ]


Todd Lipcon commented on KUDU-1409:
-----------------------------------

I'm thinking of the following strategy:

- given a 5 second timeout, we set the libev timer for a slightly shorter 
value, like 4.8 seconds
- upon that timeout firing, we reset the timer for the remaining 200ms
- only upon the second timeout firing, do we actually consider the call failed

The idea here is that, if the process got paused, then first "pre-timeout" 
timer will get arbitrarily delayed. Then, when we wake up, we'll give it an 
extra 200ms to try to read the call response off the wire if it is in fact 
already waiting. In the case that there was no process pause, we pay the "cost" 
of an extra libev wakeup, but timeouts are rare so this shouldn't really 
matter. We might also be giving up a slight amount of accuracy on timeouts, but 
for long timeouts that shouldn't be important (they're usually chosen rather 
arbitrarily).

Any other good ideas here?



> Make krpc call timeouts more resistant to process pauses
> --------------------------------------------------------
>
>                 Key: KUDU-1409
>                 URL: https://issues.apache.org/jira/browse/KUDU-1409
>             Project: Kudu
>          Issue Type: Improvement
>          Components: rpc
>    Affects Versions: 0.8.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> In stress testing Impala on Kudu I've seen various RPC timeouts that turn out 
> to be due to pauses on the client side. In particular, scenarios like 
> https://issues.cloudera.org/browse/IMPALA-2800 can cause the memory allocator 
> inside Impala to block for several seconds, and that might cause us to think 
> we missed a timeout.
> We should be more resilient to this sort of "false" timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1409) Make krpc call timeouts more resistant to process pauses

Reply via email to