[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084779#comment-13084779
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
-------------------------------------------

I don't think keeping passing a list of unavailableEndpoints everywhere is 
actually necessary.  I may be missing a use case, but what I see is

- in sendToHintedEndpoints
- in assureSufficientLiveNodes implementation

Both of which can be replaced in a straightforward manner with FailureDetector 
calls.  (Note that it is not necessary for FD state to remain unchanged between 
assureSufficient and sending.)

In fact using the same list in both places is a bug: assureSufficient only 
cares about what FD thinks, so mixing hinted-handoff-enabledness in as 
getUnavailableEndpoints does will cause assureSufficient to return false 
positives w/ HH off.

So I'd make assureSufficient use FD directly, and sendTHE use FD + HH state.  
Bonus: no List allocation in the common case of "everything is healthy."

> Make Read Repair unnecessary when Hinted Handoff is enabled
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-2034
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Patricio Echague
>             Fix For: 1.0
>
>         Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
> CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
> CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
> CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
> CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
> CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
> CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
> CASSANDRA-2034-trunk.patch
>
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> Currently, HH is purely an optimization -- if a machine goes down, enabling 
> HH means RR/AES will have less work to do, but you can't disable RR entirely 
> in most situations since HH doesn't kick in until the FailureDetector does.
> Let's add a scheduled task to the mutate path, such that we return to the 
> client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
> check the responseHandler write acks and write local hints for any missing 
> targets.
> This would making disabling RR when HH is enabled a much more reasonable 
> option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to