[ https://issues.apache.org/jira/browse/CASSANDRA-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869639#comment-13869639 ]
Jonathan Ellis commented on CASSANDRA-5263: ------------------------------------------- You can estimate rows (partitions) in a range with the index sample. SSTR.estimatedKeysForRanges will do this for you. (Until we have minhash or similar a la CASSANDRA-6474 you'll probably want to assume worst-case, i.e. no overlap among the sstables.) 100MB isn't much in an 8GB heap. I don't think we need to worry about that. Is the tree building cpu bound or i/o bound? > Allow Merkle tree maximum depth to be configurable > -------------------------------------------------- > > Key: CASSANDRA-5263 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5263 > Project: Cassandra > Issue Type: Improvement > Components: Config > Affects Versions: 1.1.9 > Reporter: Ahmed Bashir > Assignee: Minh Do > > Currently, the maximum depth allowed for Merkle trees is hardcoded as 15. > This value should be configurable, just like phi_convict_treshold and other > properties. > Given a cluster with nodes responsible for a large number of row keys, Merkle > tree comparisons can result in a large amount of unnecessary row keys being > streamed. > Empirical testing indicates that reasonable changes to this depth (18, 20, > etc) don't affect the Merkle tree generation and differencing timings all > that much, and they can significantly reduce the amount of data being > streamed during repair. -- This message was sent by Atlassian JIRA (v6.1.5#6160)