[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027151#comment-13027151
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
-------------------------------------------

bq. add a scheduled task

this is the wrong approach, as we found out when we tried something similar for 
read repair, which we fixed in CASSANDRA-2069.

Better would be to add a hook to messagingservice callback expiration, and fire 
hint recording from there if MS expires the callback before all acks are 
received.  (We could refactor the dynamic snitch latency update into a similar 
hook for reads.)

bq. This would need a separate executor for local writes that doesn't drop 
writes when it's behind

I'm more worried about this; there is the potential to take us back to the Bad 
Old Days when HH could cause cascading failure. (Of course the right answer is, 
"Don't run your cluster so close to the edge of capacity," but we still want to 
degrade gracefully when this is ignored.)

> Make Read Repair unnecessary when Hinted Handoff is enabled
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-2034
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Currently, HH is purely an optimization -- if a machine goes down, enabling 
> HH means RR/AES will have less work to do, but you can't disable RR entirely 
> in most situations since HH doesn't kick in until the FailureDetector does.
> Let's add a scheduled task to the mutate path, such that we return to the 
> client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
> check the responseHandler write acks and write local hints for any missing 
> targets.
> This would making disabling RR when HH is enabled a much more reasonable 
> option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to