[ 
https://issues.apache.org/jira/browse/CASSANDRA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023207#comment-13023207
 ] 

Peter Schuller commented on CASSANDRA-2540:
-------------------------------------------

Fair enough ;)

Just one point about the dynamic snitch: It's true, though it's also the case 
that the dynamic snitch has a periodicity to it and doesn't alter endpoint 
selection in "real" time. With the default of 100 ms, at a request rate of 
1000, you still have 50 requests on average that will be adversely affected in 
the event of a node hiccup. And that's assuming instant feedback which I don't 
think is the case (i.e., I believe requests have to actually time out before 
they get accounted as high-latency, so I think we don't expect the dynamic 
snitch to kick in until RPC timeouts, meaning that at a request rate of 1k you 
expect thousands of requests to be affected).

In that sense, it would be nice to avoid extreme outliers without having to do 
something like making dynamic re-routing terribly aggressive (which has 
potential for foot-shooting if the feedback mechanism is too immediate).

So I guess my point is: the dynamic snitch is a different attack vector to a 
related but not identical problem.


> Data reads by default
> ---------------------
>
>                 Key: CASSANDRA-2540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2540
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stu Hood
>             Fix For: 0.8.0
>
>
> The intention of digest vs data reads is to save bandwidth in the read path 
> at the cost of latency, but I expect that this has been a premature 
> optimization.
> * Data requested by a read will often be within an order of magnitude of the 
> digest size, and a failed digest means extra roundtrips, more bandwidth
> * The [digest reads but not your data 
> read|https://issues.apache.org/jira/browse/CASSANDRA-2282?focusedCommentId=13004656&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13004656]
>  problem means failing QUORUM reads because a single node is unavailable, and 
> would require eagerly re-requesting at some fraction of your timeout
> * Saving bandwidth in cross datacenter usecases comes at huge cost to 
> latency, but since both constraints change proportionally (enough), the 
> tradeoff is not clear
> Some options:
> # Add an option to use digest reads
> # Remove digest reads entirely (and/or punt and make them a runtime 
> optimization based on data size in the future)
> # Continue to use digest reads, but send them to {{N - R}} nodes for 
> (somewhat) more predicatable behavior with QUORUM
> \\
> The outcome of data-reads-by-default should be significantly improved 
> latency, with a moderate increase in bandwidth usage for large reads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to