[ 
https://issues.apache.org/jira/browse/CASSANDRA-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889494#comment-13889494
 ] 

Marcus Eriksson commented on CASSANDRA-5351:
--------------------------------------------

More complete version now pushed to  
https://github.com/krummas/cassandra/tree/marcuse/5351
Lots of testing required, but i think it is mostly 'feature-complete';

Repair flow is now:
# Repair coordinator sends out Prepare messages to all neighbors
# All involved parties figure out what sstables should be included in the 
repair (if full repair, all sstables are included) otherwise only the ones with 
repairedAt set to 0. Note that we don't do any locking of the sstables, if they 
are gone when we do anticompaction it is fine - we will repair them next round.
# Repair coordinator prepares itself and waits until all neighbors have 
prepared and sends out TreeRequests.
# All nodes calculate merkle trees based on the sstables picked in step #2
# Coordinator waits for replies and then sends AnticompactionRequests to all 
nodes
# If we are doing full repair, we simply skip doing anticompaction.

notes;
* SSTables are tagged with repairedAt timestamps, compactions keep 
min(repairedAt) of the included sstables.
* nodetool repair defaults to use the old behaviour. Use --incremental to use 
the new repairs.
* anticompaction
  - Split an sstable in 2 new ones. One sstable with all keys that were in the 
repaired ranges and one with unrepaired data.
  - If the repaired ranges cover the entire sstable, we rewrite sstable 
metadata. This means that the optimal way to run incremental repairs is to not 
do partitioner range repairs etc.
* Compaction
  * LCS
    - We always first check if there are any unrepaired sstables to do STCS on, 
if there is, we do that. Reasoning being that new data (which needs compaction) 
is unrepaired.
    - We keep all sstables in the LeveledManifest, then filter out the 
unrepaired ones when getting compaction candidates etc.
  * STCS
    - Major compaction is done by taking the biggest set of sstables - so for a 
total major compaction, you will need to run nodetool compact twice.
    - Minors works the same way, the biggest set of sstables will be compacted.
* Streaming - A streamed SSTable keeps its repairedAt time.
* BulkLoader - Loaded sstables are unrepaired.
* Scrub - Set repairedAt to UNREPAIRED - since we can drop rows during repair 
new sstable is not repaired.
* Upgradesstables - Keep repaired status


> Avoid repairing already-repaired data by default
> ------------------------------------------------
>
>                 Key: CASSANDRA-5351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5351
>             Project: Cassandra
>          Issue Type: Task
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Lyuben Todorov
>              Labels: repair
>             Fix For: 2.1
>
>         Attachments: 5351_node1.log, 5351_node2.log, 5351_node3.log, 
> 5351_nodetool.log
>
>
> Repair has always built its merkle tree from all the data in a columnfamily, 
> which is guaranteed to work but is inefficient.
> We can improve this by remembering which sstables have already been 
> successfully repaired, and only repairing sstables new since the last repair. 
>  (This automatically makes CASSANDRA-3362 much less of a problem too.)
> The tricky part is, compaction will (if not taught otherwise) mix repaired 
> data together with non-repaired.  So we should segregate unrepaired sstables 
> from the repaired ones.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to