Always use even distribution for merkle tree with RandomPartitionner
--------------------------------------------------------------------

                 Key: CASSANDRA-2841
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2841
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.7.0
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
            Priority: Trivial
             Fix For: 0.7.7, 0.8.2
         Attachments: 2841.patch

When creating the initial merkle tree, repair tries to be (too) smart and use 
the key samples to "guide" the tree splitting. While this is a good idea for 
OPP where there is a good change the data distribution is uneven, you can't 
beat an even distribution for the RandomPartitionner. And a quick experiment 
even shows that the method used is significantly less efficient than an even 
distribution for the ranges of the merkle tree (that is, an even distribution 
gives a much better of distribution of the number of keys by range of the tree).

Thus let's switch to an even distribution for RandomPartitionner. That 3 lines 
change alone amounts for a significant improvement of repair's precision.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to