I think that we might be bikeshedding this number a bit because it is easy to debate and there is not yet one right answer. I hope we recognize either choice (4 or 16) is fine in that users can always override us and we can always change our minds later or better yet improve allocation so users don't have to care. Either choice is an improvement on the status quo. I only truly care that when we change this default let's make sure that: 1. Users can still launch a cluster all at once. Last I checked even with allocate_for_rf you need to bootstrap one node at a time for even allocation to work properly; please someone correct me if I'm wrong, and if I'm not let's get this fixed before the beta. 2. We get good documentation about this choice into our docs. [documentation team and I are on it!]
I don't like phrasing this as a "small user" vs "large user" discussion. Everybody using Cassandra wants it to be easy to operate with high availability and consistent performance. Optimizing for "I can oops a few nodes and not have an outage" is an important thing to optimize for regardless of scale. It seems we have a lot of input on this thread that we're frequently seeing users override this to 4 (apparently even with random allocation? I am personally surprised by this if true). Some people have indicated that they like a higher number like 16 or 32. Some (most?) of our largest users by footprint are still using 1. The only significant advantage I'm aware of for 16 over 4 is that users can scale up and down in increments of N/16 (12 node cluster -> 1) instead of N/4 (12 node cluster -> 3) without further token allocation improvements in Cassandra. Practically speaking I think people are often spreading nodes out over RF=3 "racks" (e.g. GCP, Azure, and AWS) so they'll want to scale by increments of 3 anyways. I agree with Jon that optimizing for scale-downs is odd; it's a pretty infrequent operation and all the users I know doing autoscaling are doing it vertically using networked attached storage (~EBS). Let's also remember repairing clusters with 16 tokens per node is slower (probably about 2-4x slower) than repairing clusters with 4 tokens. With zero copy streaming there should no benefit to more tokens for data transfer, if there is, it is a bug in streaming performance and let's fix it. Honestly, in my opinion if we have balancing issues with small number of tokens that is a bug and we should just fix it; token moves are safe, it is definitely possible for Cassandra to just self-balance itself. Let's not worry about scaring off users with this choice, choosing 4 will not scare off users any more than 256 random tokens has scared off users when they realized that they can't have any combination of two nodes down in different racks. -Joey On Fri, Jan 31, 2020 at 10:16 AM Carl Mueller <carl.muel...@smartthings.com.invalid> wrote: > edit: 4 is bad at small cluster sizes and could scare off adoption > > On Fri, Jan 31, 2020 at 12:15 PM Carl Mueller < > carl.muel...@smartthings.com> > wrote: > > > "large/giant clusters and admins are the target audience for the value we > > select" > > > > There are reasons aside from massive scale to pick cassandra, but the > > primary reason cassandra is selected technically is to support vertically > > scaling to large clusters. > > > > Why pick a value that once you reach scale you need to switch token > count? > > It's still a ticking time bomb, although 16 won't be what 256 is. > > > > Hmmmm. But 4 is bad and could scare off adoption. > > > > Ultimately a well-written article on operations and how to transition > from > > 16 --> 4 and at what point that is a good idea (aka not when your cluster > > is too big) should be a critical part of this. > > > > On Fri, Jan 31, 2020 at 11:45 AM Michael Shuler <mich...@pbandjelly.org> > > wrote: > > > >> On 1/31/20 9:58 AM, Dimitar Dimitrov wrote: > >> > one corollary of the way the algorithm works (or more > >> > precisely might not work) with multiple seeds or simultaneous > >> > multi-node bootstraps or decommissions, is that a lot of dtests > >> > start failing due to deterministic token conflicts. I wasn't > >> > able to fix that by changing solely ccm and the dtests > >> I appreciate all the detailed discussion. For a little historic context, > >> since I brought up this topic in the contributors zoom meeting, unstable > >> dtests was precisely the reason we moved the dtest configurations to > >> 'num_tokens: 32'. That value has been used in CI dtest since something > >> like 2014, when we found that this helped stabilize a large segment of > >> flaky dtest failures. No real science there, other than "this hurts > less." > >> > >> I have no real opinion on the suggestions of using 4 or 16, other than I > >> believe most "default config using" new users are starting with smaller > >> numbers of nodes. The small-but-growing users and veteran large cluster > >> admins should be gaining more operational knowledge and be able to > >> adjust their own config choices according to their needs (and good > >> comment suggestions in the yaml). Whatever default config value is > >> chosen for num_tokens, I think it should suit the new users with smaller > >> clusters. The suggestion Mick makes that 16 makes a better choice for > >> small numbers of nodes, well, that would seem to be the better choice > >> for those users we are trying to help the most with the default. > >> > >> I fully agree that science, maths, and support/ops experience should > >> guide the choice, but I don't believe that large/giant clusters and > >> admins are the target audience for the value we select. > >> > >> -- > >> Kind regards, > >> Michael > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > >> > >> >