Re: [Discuss] num_tokens default in Cassandra 4.0

Joseph Lynch Fri, 31 Jan 2020 11:25:37 -0800

I think that we might be bikeshedding this number a bit because it is easy
to debate and there is not yet one right answer. I hope we recognize either
choice (4 or 16) is fine in that users can always override us and we can
always change our minds later or better yet improve allocation so users
don't have to care. Either choice is an improvement on the status quo. I
only truly care that when we change this default let's make sure that:
1. Users can still launch a cluster all at once. Last I checked even with
allocate_for_rf you need to bootstrap one node at a time for even
allocation to work properly; please someone correct me if I'm wrong, and if
I'm not let's get this fixed before the beta.
2. We get good documentation about this choice into our docs.
[documentation team and I are on it!]

I don't like phrasing this as a "small user" vs "large user" discussion.
Everybody using Cassandra wants it to be easy to operate with high
availability and consistent performance. Optimizing for "I can oops a few
nodes and not have an outage" is an important thing to optimize for
regardless of scale. It seems we have a lot of input on this thread that
we're frequently seeing users override this to 4 (apparently even with
random allocation? I am personally surprised by this if true). Some people
have indicated that they like a higher number like 16 or 32. Some (most?)
of our largest users by footprint are still using 1.

The only significant advantage I'm aware of for 16 over 4 is that users can
scale up and down in increments of N/16 (12 node cluster -> 1) instead of
N/4 (12 node cluster -> 3) without further token allocation improvements in
Cassandra. Practically speaking I think people are often spreading nodes
out over RF=3 "racks" (e.g. GCP, Azure, and AWS) so they'll want to scale
by increments of 3 anyways. I agree with Jon that optimizing for
scale-downs is odd; it's a pretty infrequent operation and all the users I
know doing autoscaling are doing it vertically using networked attached
storage (~EBS). Let's also remember repairing clusters with 16 tokens per
node is slower (probably about 2-4x slower) than repairing clusters with 4
tokens.

With zero copy streaming there should no benefit to more tokens for data
transfer, if there is, it is a bug in streaming performance and let's fix
it.
Honestly, in my opinion if we have balancing issues with small number of
tokens that is a bug and we should just fix it; token moves are safe, it is
definitely possible for Cassandra to just self-balance itself.

Let's not worry about scaring off users with this choice, choosing 4 will
not scare off users any more than 256 random tokens has scared off users
when they realized that they can't have any combination of two nodes down
in different racks.

-Joey

On Fri, Jan 31, 2020 at 10:16 AM Carl Mueller
<carl.muel...@smartthings.com.invalid> wrote:

> edit: 4 is bad at small cluster sizes and could scare off adoption
>
> On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller <
> carl.muel...@smartthings.com>
> wrote:
>
> > "large/giant clusters and admins are the target audience for the value we
> > select"
> >
> > There are reasons aside from massive scale to pick cassandra, but the
> > primary reason cassandra is selected technically is to support vertically
> > scaling to large clusters.
> >
> > Why pick a value that once you reach scale you need to switch token
> count?
> > It's still a ticking time bomb, although 16 won't be what 256 is.
> >
> > Hmmmm. But 4 is bad and could scare off adoption.
> >
> > Ultimately a well-written article on operations and how to transition
> from
> > 16 --> 4 and at what point that is a good idea (aka not when your cluster
> > is too big) should be a critical part of this.
> >
> > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler <mich...@pbandjelly.org>
> > wrote:
> >
> >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote:
> >> > one corollary of the way the algorithm works (or more
> >> > precisely might not work) with multiple seeds or simultaneous
> >> > multi-node bootstraps or decommissions, is that a lot of dtests
> >> > start failing due to deterministic token conflicts. I wasn't
> >> > able to fix that by changing solely ccm and the dtests
> >> I appreciate all the detailed discussion. For a little historic context,
> >> since I brought up this topic in the contributors zoom meeting, unstable
> >> dtests was precisely the reason we moved the dtest configurations to
> >> 'num_tokens: 32'. That value has been used in CI dtest since something
> >> like 2014, when we found that this helped stabilize a large segment of
> >> flaky dtest failures. No real science there, other than "this hurts
> less."
> >>
> >> I have no real opinion on the suggestions of using 4 or 16, other than I
> >> believe most "default config using" new users are starting with smaller
> >> numbers of nodes. The small-but-growing users and veteran large cluster
> >> admins should be gaining more operational knowledge and be able to
> >> adjust their own config choices according to their needs (and good
> >> comment suggestions in the yaml). Whatever default config value is
> >> chosen for num_tokens, I think it should suit the new users with smaller
> >> clusters. The suggestion Mick makes that 16 makes a better choice for
> >> small numbers of nodes, well, that would seem to be the better choice
> >> for those users we are trying to help the most with the default.
> >>
> >> I fully agree that science, maths, and support/ops experience should
> >> guide the choice, but I don't believe that large/giant clusters and
> >> admins are the target audience for the value we select.
> >>
> >> --
> >> Kind regards,
> >> Michael
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>
> >>
>

Re: [Discuss] num_tokens default in Cassandra 4.0

Reply via email to