[ https://issues.apache.org/jira/browse/CASSANDRA-4482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434171#comment-13434171 ]
Nicolas Favre-Felix commented on CASSANDRA-4482: ------------------------------------------------ Indeed, I've used "range" as Range<Token>, the range of tokens owned by a node; I should have made this clearer. We are not using the MerkleTree class or its TreeRange objects, but updating a single ByteBuffer directly instead of creating the whole tree with its hundreds of internal objects. This is equivalent to updating the leaves alone, without propagating the hash upwards in the tree. Yes, that means comparing two trees is O(leaf count). bq. I xor all these together to get my initial state, S. To update row A to row A', I need to take S xor hash(A) xor hash(A'). If you've lready xor'd all these together, S does include the hash of your existing row A. Updating A to A' hashes A' and returns S' = S xor hash(A'), which is hash(A') xor hash(A). In practice, this is how it works step by step: # Load existing buffers when the ColumnFamilyStore is created: per Range<Token>, load an existing buffer or create a new one initialized with zeros. # ColumnFamilyStore.apply() is called with columns X and Y in row A. For instance, row A could have token 0x10, falling in the range (0x00, 0x20]. The incremental repair ByteBuffer for this range is 1 KB in size. # Create a new digest and run Column.updateDigest() on X and Y sucessively. We end up with H = hash(X) xor hash(Y); H is 16 bytes long. # Calculate O, the offset in the ByteBuffer that corresponds to H: in this case, it's around 512 since 0x10 is close to the middle of the range (0x00, 0x20]. # For each byte i of H, we set buffer[O+i] = buffer[O+i] xor H[i]. During the repair session, the replicas send out their existing ByteBuffers for the range being repaired and replace them with empty ones that will receive subsequent inserts. bq. And sync the BB saving with CF flushes so CL replay matches up, I imagine. Yes. If you terminate Cassandra at this stage, the ByteBuffer is written to disk and will contains [0,0.... a few bytes of hash(X) xor hash(Y) around the middle ... 0,0,0,0]. > In-memory merkle trees for repair > --------------------------------- > > Key: CASSANDRA-4482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4482 > Project: Cassandra > Issue Type: New Feature > Reporter: Marcus Eriksson > > this sounds cool, we should reimplement it in the open source cassandra; > http://www.acunu.com/2/post/2012/07/incremental-repair.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira