[jira] [Commented] (CASSANDRA-3721) Staggering repair

Sylvain Lebresne (Commented) (JIRA) Mon, 23 Jan 2012 05:27:05 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13191146#comment-13191146
 ]


Sylvain Lebresne commented on CASSANDRA-3721:
---------------------------------------------

I did a quick pass on the patches. It seems to me that the refactoring of 
AntiEntropyService this patch does is largely orthogonal to the issue at hand.  
All that seem needed for this issue is to allow sending treeRequest one after 
the other. But it should be doable with 2 lines in RepairJob.addTree(), and 
maybe a few more lines to send the snapshot commands. This would have the 
advantage of making it clear that the patch isn't breaking anything.

I am not saying that the AntiEntropyService synchronization code is the 
cleanest one we have, and maybe a refactoring could improve that. I'm not 
necessarily convinced such refactoring is necessary at this point, but if you 
care enough about it, I'm not strongly against it either, but I want to point 
out that making that refactoring as part of this ticket almost surely make this 
out of reach for 1.1 (as it'll make review more complicated and make it 
unreasonable to shove this in a handful of days before the freeze).

As a side note, I spotted 2 changes that seems gratuitous without seemingly 
improving the code:
* In TreeRequestVerbHandler.doVerb, you renamed the variables. However I think 
the new name, cloneRequest, is misleading as we're not really doing a clone.
* Is there a reason to change RepairFuture to not be a Future anymore? Even if 
we don't really use it, it can be convenient to have it implement the native 
Future interface, especially given it's called RepairFuture.
                
> Staggering repair
> -----------------
>
>                 Key: CASSANDRA-3721
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3721
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: 0001-staggering-repair-with-snapshot.patch
>
>
> Currently repair runs on all the nodes at once and causing the range of data 
> to be hot (higher latency on reads).
> Sequence:
> 1) Send a repair request to all of the nodes so we can hold the references of 
> the SSTables (point at which repair was initiated)
> 2) Send Validation on one node at a time (once completed will release 
> references).
> 3) Hold the reference of the tree in the requesting node and once everything 
> is complete start diff.
> We can also serialize the streaming part not more than 1 node is involved in 
> the streaming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3721) Staggering repair

Reply via email to