Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "VirtualNodes/Balance" page has been changed by EricEvans:
http://wiki.apache.org/cassandra/VirtualNodes/Balance

Comment:
stubbed out page

New page:
This page is for design notes and information relating to operations effecting 
token/range ownership.  See also:

 * [[https://issues.apache.org/jira/browse/CASSANDRA-4445|CASSANDRA-4445: 
balance utility for vnodes]]
 * [[https://issues.apache.org/jira/browse/CASSANDRA-4443|CASSANDRA-4443: 
shuffle utility for vnodes]]

<<TableOfContents>>

<<Anchor(requirements)>>
== Requirements ==

 1. Offsetting ownership ratios for [[#heterogeneous_nodes|heterogeneous nodes]]
 1. Correcting [[#imbalance|imbalances created by random token selection]]
 1. [[#shuffling|Randomizing ranges]] after a migration

<<Anchor(heterogeneous_nodes)>>
== Heterogeneous Nodes ==

When running a cluster of heterogeneous nodes, (i.e. differing amounts of 
storage, memory, cores, etc), it may be desirable to place a greater or less 
portion of the keyspace on one or more nodes.

<<Anchor(imbalance)>>
== Imbalance ==

By default, a nodes tokens are randomly generated with the expectation that an 
even distribution of the namespace will result.  However, variations of as much 
as 7% have been reported on small clusters when using the `num_tokens` default 
of 256.

These randomly generated tokens are MD5 sums, so entropy isn't the problem 
here, at least not in the sense that using a better RNG would create a more 
even distribution of ranges.  Increasing the token count (either by increasing 
num_tokens, or the number of nodes) will improve this, (the more tokens, the 
more the distribution will even out).

This anecdotal worst-case is probably Good Enough, especially when considering 
that key distribution is subject to the same properties, or that many data sets 
are skewed on their own, (i.e. optimal ownership is not necessary optimal 
anyway).

That said, our history is one where random token selection produced completely 
unacceptable results, and manual intervention was required.  The typical 
(expected) result of manual token selection is near perfect balance of 
ownership, and it will likely be some time before people are comfortable seeing 
otherwise.

<<Anchor(shuffling)>>
== Shuffling ==

When migrating a legacy cluster with one-token-per-node to virtual nodes, the 
existing range is carved up into `num_tokens` new ranges.  These new ranges are 
still contiguous however, and a means of randomizing their placement is needed.

Reply via email to