hi everyone, I am trying to understand anti-entropy measures in general and specifically the ones used in cassandra. The more I look at it the more I get confused, so I would be glad if anyone could shed some light on the following question, or point me in the right direction:
on the one hand a search tree needs to be reasonably balanced for it to remain efficient in lookup and insertion. On the other hand, for a merkle tree we want the trees that are being compared to be as similar as possible, where the underlying data allows. If we for example assume the two sides to contain values [1,2,3] and [0,1,2,3] respectively, and assume this tree for the "left" side: A / \ B 3 / \ 1 2 a balanced tree for the rigth hand side would look like this: C / \ D E / \ / \ 0 1 2 3 this is nice for search/insert, but means that syncing the two trees requires a full sync of all nodes (to make right := left), as all internal node hashes are different. an unbalanced tree could, if unbalanced just the right way, look like this: a / \ b 3 / \ X 2 / \ 0 1 in this case the sync would quickly converge on "0" being the only different leaf. In this example the actual number of messages exchanged is not that different, because there are so few inner nodes and all of them lie on the path to "0". but if you extend the example to a larger tree then the difference becomes quite dramatic. One obvious way around this would be to not use a balanced tree, but one where each item is in a more predictable position in the tree. in the example above (integer keys), you could for example use the least significant bit at the root to distinguish whether items are "left" or "right" of the node, and the next bit on the level below: i / \ g h / \ / \ 0 2 1 3 a node with only one child would collapse: j / \ 2 h / \ 1 3 which would preserve the structure of parts of the tree that are similar between the two (h and below here). Obviously the unbalanced nature is a problem, as is the fact that you can't do range lookups anymore. But at the same time the full-sync on single insert sounds quite pathological as well... So I am wondering: is this a problem? how does cassandra deal with it? does it actually use fully balanced trees? Is there any literature on this? thanks robert -- Robert Lemmen http://www.semistable.com
signature.asc
Description: PGP signature