Great investigation, good job guys!

> Personally I would have liked to have seen even more iterations. While 14
run iterations gives an indication, the average of randomness is not what
is important here. What concerns me is the consequence to imbalances as the
cluster grows when you're very unlucky with initial random tokens, for
example when random tokens land very close together. The token allocation
can deal with breaking up large token ranges but is unable to do anything
about such tiny token ranges. Even a bad 1-in-a-100 experience should be a
consideration when picking a default num_tokens.

Perhaps a simple way to avoid this is to update the random allocation
algorithm to re-generate tokens when the ranges created do not have a good
size distribution?

> But it can be worse, for example if you have RF=3 and only two racks then
you will only get random tokens. We know of a number of production clusters
that have been set up this way. I am unaware of any Cassandra docs or
community recommendations that say you should avoid doing this. So, this is
a problem regardless of the value for num_tokens.

Having the number of racks not a multiple of the replication factor is not
a good practice since it can lead to imbalance and other problems like
this, so we should not only document this but perhaps add a warning or even
hard fail when this is encountered during node startup?

Cheers,

Paulo

Em seg., 9 de mar. de 2020 às 08:25, Mick Semb Wever <m...@apache.org>
escreveu:

>
> > Can we ask for some analysis and data against the risks different
> > num_tokens choices present. We shouldn't rush into a new default, and
> such
> > background information and data is operator value added.
>
>
> Thanks for everyone's patience on this topic.
> The following is further input on a number of fronts.
>
>
> ** Analysis of Token Distributions
>
> The following is work done by Alex Dejanovski and Anthony Grasso. It
> builds upon their previous work at The Last Pickle and why we recommend 16
> as the best value to clients. (Please buy beers for these two for the
> effort they have done here.)
>
> The following three graphs show the ranges of imbalance that occur on
> clusters growing from 4 nodes to 12 nodes, for the different values of
> num_tokens: 4, 8 and 16. The range is based on 14 run iterations (except 16
> which only got ten).
>
>
> num_tokens: 4
>
>
> num_tokens: 8
>
>
> num_tokens: 16
>
> These graphs were generated using clusters created in AWS by tlp-cluster (
> https://github.com/thelastpickle/tlp-cluster). A script was written to
> automate the testing and generate the data for each value of num_tokens.
> Each cluster was configured with one rack.  Of course these interpretations
> are debatable. The data to the graphs is in
> https://docs.google.com/spreadsheets/d/1gPZpSOUm3_pSCo9y-ZJ8WIctpvXNr5hDdupJ7K_9PHY/edit?usp=sharing
>
>
> What I see from these graphs is…
>  a)  token allocation is pretty good are fixing initial bad random token
> imbalances. By the time you are at 12 nodes, presuming you have setup the
> cluster correctly so that token allocation actually works, your nodes will
> be balanced with num_tokens 4 or greater.
>  b) you need to get to ~12 nodes with num_tokens 4 to have a good balance.
>  c) you need to get to ~9 nodes with num_token 8 to have a good balance.
>  d) you need to get to ~6 nodes with num_tokens 16 to have a good balance.
>
> Personally I would have liked to have seen even more iterations. While 14
> run iterations gives an indication, the average of randomness is not what
> is important here. What concerns me is the consequence to imbalances as the
> cluster grows when you're very unlucky with initial random tokens, for
> example when random tokens land very close together. The token allocation
> can deal with breaking up large token ranges but is unable to do anything
> about such tiny token ranges. Even a bad 1-in-a-100 experience should be a
> consideration when picking a default num_tokens.
>
>
> ** When does the Token Allocation work…
>
> This has been touched on already in this thread. There are cases where
> token allocation fails to kick in. The first node in up to RF racks
> generates random tokens, this typically means the first three nodes.
>
> But it can be worse, for example if you have RF=3 and only two racks then
> you will only get random tokens. We know of a number of production clusters
> that have been set up this way. I am unaware of any Cassandra docs or
> community recommendations that say you should avoid doing this. So, this is
> a problem regardless of the value for num_tokens.
>
>
> ** Algorithmic token allocation does not handle the racks = RF case well (
> CASSANDRA-15600 <https://issues.apache.org/jira/browse/CASSANDRA-15600>)
>
> This recently landed in trunk. My understanding is that this improves the
> situation the graphs cover, but not the situation just described where a DC
> has 1>racks>RF.  Ekaterina, maybe you could elaborate?
>
>
> ** Decommissioning Nodes
>
> Elasticity is a feature to Cassandra. The operational costs to Cassandra
> are a real consideration. A reduction from a 9 node cluster back to a 6
> node cluster does happen often enough. Decommissioning nodes on smaller
> clusters have the greatest operational cost savings yet will suffer most
> from too low a num_tokens setup.
>
>
> ** Recommendations from Cassandra Consulting Companies
>
> My understanding is that DataStax recommends num_tokens 8, while
> Instaclustr and The Last Pickle have both recommended 16. Interestingly
> enough those that are pushing for num_tokens 4,  are using today num_tokens
> 1 (and are already sitting with a lot of in-house C* experience).
>
>
> ** Keeping it Real
>
> Clusters where we have used num_tokens 4 we have regretted. This and past
> analysis work, similar to above, had led us to use 16 num_tokens. Cost
> optimisation of clusters is one of the key user concerns out there, and we
> have witnessed problems on this front with num_tokens 4.
>
> While we accept the validity and importance of the increased availability
> provided by num_tokens 4, we have never seen or used it in practice. The
> default value of num_tokens is important. The value of 256 has been good
> business for consultants, it was a bad choice for clusters and difficult to
> change. A new default should be chosen wisely.
>
>
> regards,
> Mick, Anthony, Alex
>
>

Reply via email to