[ https://issues.apache.org/jira/browse/CASSANDRA-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne resolved CASSANDRA-2541. ----------------------------------------- Resolution: Invalid Well I was actually wrong in that the splitting reuse the sample as long as it doesn't have a complete tree (complete in the sens of depth or size greater that there fixed limits). So I think there is no particular problem here. There is a small bug in 0.8 code that can make the splitting process exit early, but I'll open another ticket for that. > Improve the precision of the repair merkle trees > ------------------------------------------------ > > Key: CASSANDRA-2541 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2541 > Project: Cassandra > Issue Type: Improvement > Components: Core > Affects Versions: 0.6 > Reporter: Sylvain Lebresne > Assignee: Sylvain Lebresne > Priority: Minor > Labels: repair > Fix For: 0.8.1 > > Original Estimate: 8h > Remaining Estimate: 8h > > Repair uses the sstable sampled keys to split the merkle tree. This means the > 'precision' of the tree will be index_interval (so 128 by default). This is > probably fine when you have lots of skinny rows. But when you have less fat > rows, this is probably unnecessary imprecise. > Added to that the fact that each node will not have the same set of samples, > you may not always end up using the more precise range in the trees when > computing differences, which could make the imprecision worst (to be fair, it > is quite possible this happens very rarely). > Anyway, this ticket proposes to add an additional 'split_factor' (can be > fixed, can be configurable (by the user or based on metrics on how fat the > rows are)) that makes use re-split 'split_factor' times each ranges after the > initial sample-based split of the tree. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira