[ https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408329#comment-13408329 ]
Jonathan Ellis commented on CASSANDRA-2698: ------------------------------------------- Following the comments at the top, you want two things here: # A histogram of TreeRange row counts # for each pair of merkle tree, the number of ranges that differs and the corresponding streamed size of the data 1. is easy: add an EstimatedHistogram to the MerkleTree class, and when the ranges are finished computing, you'd iterate over each and add its row count to the histogram 2. is a bit more involved: you want to extend the logging done by Differencer to include the given information, which is going to involve poking into the guts of (probably) MerkleTree.difference and performStreamingRepair. I agree though that repair is an intimidating part of the code base. If you want to start with something simpler, that's fine too. > Instrument repair to be able to assess it's efficiency (precision) > ------------------------------------------------------------------ > > Key: CASSANDRA-2698 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2698 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Priority: Minor > Labels: lhf > Attachments: nodetool_repair_and_cfhistogram.tar.gz > > > Some reports indicate that repair sometime transfer huge amounts of data. One > hypothesis is that the merkle tree precision may deteriorate too much at some > data size. To check this hypothesis, it would be reasonably to gather > statistic during the merkle tree building of how many rows each merkle tree > range account for (and the size that this represent). It is probably an > interesting statistic to have anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira