[ 
https://issues.apache.org/jira/browse/CASSANDRA-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462456#comment-13462456
 ] 

Peter Schuller commented on CASSANDRA-4705:
-------------------------------------------

Here's a good example of complexity implication that I just thought of (and 
it's stuff like this I'm worried about w.r.t. complexity): How do you split 
requests into "groups" within which to do latency profiling? If you don't, 
you'll easily end up having the expensive requests always be processed multiple 
times because they always hit the backup path (because they are expensive and 
thus latent). So you could very easily "eat up" all your intended benefit by 
having the very expensive requests take the backup path. Without knowledge of 
the nature of the requests, and since we cannot reliably just assume a 
homogenous request pattern, you would probably need some non-trivial way of 
classifying requests and having it relate to these statistics to keep.

In some cases, having it be a per-cf setting might be enough. In other cases 
that's not feasable - for example maybe you're doing slicing on large rows, and 
maybe it's impossible to determine based on an incoming requests whether it's 
expensive or not (the range may be high but result in only a single column, for 
example).

What if you don't care about the latency of the "legitimately expensive" 
requests, but about the cheap ones? And what if those "legitimately expensive" 
requests consumes your 1% (p99), such that none of the "cheaper" requests are 
subject to backup requests? Now you get none of the benefit, but you still take 
the brunt of the cost you'd have if you just went with full data reads.

I'm sure there are many other concerns I'm not thinking of; this was meant as 
an example of how it can be hard to make this actually work the way it's 
intended.

                
> Speculative execution for CL_ONE
> --------------------------------
>
>                 Key: CASSANDRA-4705
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4705
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>
> When read_repair is not 1.0, we send the request to one node for some of the 
> requests. When a node goes down or when a node is too busy the client has to 
> wait for the timeout before it can retry. 
> It would be nice to watch for latency and execute an additional request to a 
> different node, if the response is not received within average/99% of the 
> response times recorded in the past.
> CASSANDRA-2540 might be able to solve the variance when read_repair is set to 
> 1.0
> 1) May be we need to use metrics-core to record various Percentiles
> 2) Modify ReadCallback.get to execute additional request speculatively.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to