[ 
https://issues.apache.org/jira/browse/CASSANDRA-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026880#comment-13026880
 ] 

Terje Marthinussen commented on CASSANDRA-2540:
-----------------------------------------------

Would it make sense to make dynamic snitch "react" to "digest but not data" 
error directly to improve recovery time and why do we need to wait for the 
100ms to make dynamic snitch work?

Couldn't we add a load balancer like function that would detect if a node has a 
significant number of outstanding requests on node 1 vs. 2 and 3, then send to 
node 2 instead?

Overall, I am not asking for a round robin load balancer though... (not good 
for caching)

The digest function may also in some cases maybe be made dynamic based on the 
size of the data being read. That is, just send the data for small data sizes 
and use digests for large responses?

No, I don't know all the details on how this part of the code work, so my 
suggestions may be totally wrong :)

I do wonder however if we get enough of these timeouts to actually be a 
problem. If there is a couple of delays in latencies for a few seconds 2-3 
times a week/month, no problem. 

However, if we have so many of these errors that people see them many times a 
day, it would seem like this is a performance problem somewhere in cassandra 
which should be fixed rather than applying some patchwork which hide it... 

> Data reads by default
> ---------------------
>
>                 Key: CASSANDRA-2540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2540
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Stu Hood
>            Priority: Minor
>
> The intention of digest vs data reads is to save bandwidth in the read path 
> at the cost of latency, but I expect that this has been a premature 
> optimization.
> * Data requested by a read will often be within an order of magnitude of the 
> digest size, and a failed digest means extra roundtrips, more bandwidth
> * The [digest reads but not your data 
> read|https://issues.apache.org/jira/browse/CASSANDRA-2282?focusedCommentId=13004656&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13004656]
>  problem means failing QUORUM reads because a single node is unavailable, and 
> would require eagerly re-requesting at some fraction of your timeout
> * Saving bandwidth in cross datacenter usecases comes at huge cost to 
> latency, but since both constraints change proportionally (enough), the 
> tradeoff is not clear
> Some options:
> # Add an option to use digest reads
> # Remove digest reads entirely (and/or punt and make them a runtime 
> optimization based on data size in the future)
> # Continue to use digest reads, but send them to {{N - R}} nodes for 
> (somewhat) more predicatable behavior with QUORUM
> \\
> The outcome of data-reads-by-default should be significantly improved 
> latency, with a moderate increase in bandwidth usage for large reads.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to