Improve the precision of the repair merkle trees
------------------------------------------------

                 Key: CASSANDRA-2541
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2541
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.6
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Minor
             Fix For: 0.8.1


Repair uses the sstable sampled keys to split the merkle tree. This means the 
'precision' of the tree will be index_interval (so 128 by default). This is 
probably fine when you have lots of skinny rows. But when you have less fat 
rows, this is probably unnecessary imprecise.

Added to that the fact that each node will not have the same set of samples, 
you may not always end up using the more precise range in the trees when 
computing differences, which could make the imprecision worst (to be fair, it 
is quite possible this happens very rarely).

Anyway, this ticket proposes to add an additional 'split_factor' (can be fixed, 
can be configurable (by the user or based on metrics on how fat the rows are)) 
that makes use re-split 'split_factor' times each ranges after the initial 
sample-based split of the tree.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to