Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.
The "VirtualNodes/Balance" page has been changed by EricEvans: http://wiki.apache.org/cassandra/VirtualNodes/Balance Comment: stubbed out page New page: This page is for design notes and information relating to operations effecting token/range ownership. See also: * [[https://issues.apache.org/jira/browse/CASSANDRA-4445|CASSANDRA-4445: balance utility for vnodes]] * [[https://issues.apache.org/jira/browse/CASSANDRA-4443|CASSANDRA-4443: shuffle utility for vnodes]] <<TableOfContents>> <<Anchor(requirements)>> == Requirements == 1. Offsetting ownership ratios for [[#heterogeneous_nodes|heterogeneous nodes]] 1. Correcting [[#imbalance|imbalances created by random token selection]] 1. [[#shuffling|Randomizing ranges]] after a migration <<Anchor(heterogeneous_nodes)>> == Heterogeneous Nodes == When running a cluster of heterogeneous nodes, (i.e. differing amounts of storage, memory, cores, etc), it may be desirable to place a greater or less portion of the keyspace on one or more nodes. <<Anchor(imbalance)>> == Imbalance == By default, a nodes tokens are randomly generated with the expectation that an even distribution of the namespace will result. However, variations of as much as 7% have been reported on small clusters when using the `num_tokens` default of 256. These randomly generated tokens are MD5 sums, so entropy isn't the problem here, at least not in the sense that using a better RNG would create a more even distribution of ranges. Increasing the token count (either by increasing num_tokens, or the number of nodes) will improve this, (the more tokens, the more the distribution will even out). This anecdotal worst-case is probably Good Enough, especially when considering that key distribution is subject to the same properties, or that many data sets are skewed on their own, (i.e. optimal ownership is not necessary optimal anyway). That said, our history is one where random token selection produced completely unacceptable results, and manual intervention was required. The typical (expected) result of manual token selection is near perfect balance of ownership, and it will likely be some time before people are comfortable seeing otherwise. <<Anchor(shuffling)>> == Shuffling == When migrating a legacy cluster with one-token-per-node to virtual nodes, the existing range is carved up into `num_tokens` new ranges. These new ranges are still contiguous however, and a means of randomizing their placement is needed.