Thank you Benedict. Considering there were no objections I am closing the discussion and getting back to work on the ticket itself. Thank you all. Have a great week ahead.
On Wed, 20 Oct 2021 at 18:06, bened...@apache.org <bened...@apache.org> wrote: > Thanks for moving this forwards Ekaterina. > > I think what we perhaps discovered is that there’s not really any > consensus about how to best do config files. I think in this situation it’s > best to defer to the one who’s actually putting in the time to _do_, so I > am more than happy to defer to your decisions. > > I’m sure everyone is looking forward to the improved consistency of this > work. > > > From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > Date: Wednesday, 20 October 2021 at 22:27 > To: dev@cassandra.apache.org <dev@cassandra.apache.org> > Subject: Re: [DISCUSS] CASSANDRA-15234 > Hi everyone, > > I think it is time to summarize the discussion. > > First of all, thank you for all the valuable input, suggestions, concerns, > and comments! > > The things that I believe we all agree on: > > - > > Simplicity for maintenance on our end - automation as much as possible > so we don’t have to maintain more than one configuration file and our > config is less prone to human errors while adding new features > - > > Simplicity for our users - as less confusing and as simple as possible > and having in mind the users’ toolset > - > > Simplicity for testing and verification of the different config file > formats > > > It seems to me that most people want to see committed both proposed > versions(feel free to correct me if I am wrong) with revision of the > default values and potentially commented out all parameters that are not > really mandatory to be changed. Also, versions with striped comments plus a > way to maintain everything automatically, as much as possible. > > With that said it seems to me the current patch in CASSANDRA-15234 can be > committed after rebase and addressing any outstanding review comments. The > new version of cassandra.yaml, grouping the parameters can be added in a > new ticket by me or anyone with free cycles for that. It will require > additional work on the backward compatibility and the opportunity for > Cassandra to operate on all of the current versions but it will be new > additional opportunity which doesn’t disqualify the old ones so it seems as > a fair game to be added at any point in time in the future as it won’t be a > breaking change. We won’t replace anything. We will only add more options. > > If someone disagrees and wants to implement all possible options and > functionalities at once, I will be happy to handover the work and try to > find the time to provide feedback/reviews later. > > Please do not hesitate to correct me if I misunderstood something. > > I will leave this discussion open until Monday and if there are no > objections I will continue with CASSANDRA-15234 as per my proposal. > > Best regards, > > Ekaterina > > On Fri, 10 Sep 2021 at 20:18, Patrick McFadin <pmcfa...@gmail.com> wrote: > > > Ah, I feel like cassandra.yaml discussions are such an evergreen topic. > > > > This was something brought up a while back, but I remember years ago we > > talked about emulating the config options that some other databases have > > done. Providing different versions of the config for different > approaches. > > For instance, MySQL has had 'my-small.cnf' with just the bare minimum > > config and restricted parameters for something like a laptop. A friendly > > option for newcomers would be a clearly labeled 'cassandra-small.yaml' > > with just the bare minimum and good comments. Then people new to > Cassandra > > wouldn't have a panic moment wondering if they have to know what > concurrent > > compactors are and how many you actually need? (Is there a right answer > > even???) It's tackling the way operators approach config by the use case > > they are trying to satisfy. Run one node on my laptop. Run a small > cluster > > on a budget cloud server. Run any size cluster on a ginormous server. > > > > Unfortunately, the cleaner solution would be how Apache HTTD solved it > back > > in the day with include files. It made config management much easier and > > the overwhelm factor much lower. Yaml doesn't support it and it would all > > have to be custom code in the Cassandra config loader. Not the best > option > > really. > > > > Back to the original question, I think Ekaterina's sectioned version > could > > be used for new operators because there is a lot to learn looking at the > > comments. Publish the following options: > > > > cassandra-small.yaml: Just the 'Quickstart' section > > cassandra-medium.yaml: 'Quickstart' and 'Commonly used' with sane > defaults > > cassandra-advanced.yaml: Every section > > > > The addition is a similarly named JVM properties file . > > > > As somebody who has been using Cassandra for a while and would like to > have > > a more verbose version (especially for config management) Benedict's > > grouped version is fantastic. Just one option there: > > > > cassandra-full.yaml > > > > That's my idea to satisfy the various operators that approach a new > > install. > > > > Patrick > > > > On Fri, Sep 10, 2021 at 3:31 PM Jeremiah D Jordan < > > jeremiah.jor...@gmail.com> > > wrote: > > > > > > Also, if you run the above command you will see we actually have a > lot > > > of things show (129 lines)… it would be nice to clean it up as only a > > small > > > subset is required and most shown normal users won’t care > > > > > > +1 for this. It would be good to clean up the config code and yaml > such > > > that only “things that are required to be changed” are not commented > out > > in > > > the file, and everything else is commented out by default. Last I > > checked > > > there were many fields that when commented out would not use a sensible > > > value, or would result in NPE’s because they didn’t have a code level > > > default. > > > > > > -Jeremiah > > > > > > > On Sep 10, 2021, at 1:24 PM, David Capwell > <dcapw...@apple.com.INVALID > > > > > > wrote: > > > > > > > > We can have both, but I would hope we do not have humans maintaining > > > both. If we maintain the commented one, and did something like the > below > > > while we compile then the burden to maintain doesn’t exist > > > > > > > > # remove comments and empty lines > > > > $ egrep -v '^[[:space:]]*#|^[[:space:]]*$' conf/cassandra.yaml.doc > > > > conf/cassandra.yaml > > > > > > > > We do this right now with conf/hotspot_compiler so as long as our > build > > > maintains the other file +1 > > > > > > > > Also, if you run the above command you will see we actually have a > lot > > > of things show (129 lines)… it would be nice to clean it up as only a > > small > > > subset is required and most shown normal users won’t care > > > > > > > >> On Sep 3, 2021, at 6:45 AM, bened...@apache.org wrote: > > > >> > > > >>> I think as the comments were stripped only for the POC. I guess > many > > > of them will get back > > > >> in the actual doc version unfortunately. > > > >> > > > >> Well, I think the grouped format lends itself to much briefer > > comments, > > > with groups of related parameters getting an overall description. Even > > as a > > > developer who understands most of the toggles I found the old file very > > > hard to navigate. > > > >> > > > >> I also don’t see why we cannot have both heavily commented versions > > and > > > uncommented (or lightly commented) versions. > > > >> > > > >> I don’t personally see why multiple different config templates would > > be > > > confusing if they’re in a suitably labelled directory, even if we > settle > > on > > > one for the default. It might even be nice to have a pared-down config > > that > > > has only those properties we expect the normal user to need, so it’s > > > particularly easy to navigate. > > > >> > > > >> > > > >> From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > > > >> Date: Friday, 3 September 2021 at 14:40 > > > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > >> Subject: Re: [DISCUSS] CASSANDRA-15234 > > > >>>> > > > >>>> It’s worth noting that the two don’t have to be in >conflict: we > > could > > > >>> offer two template yaml with the parameters grouped differently, > for > > > users > > > >>> to decide for themselves. > > > >> > > > >> Sure, my only concern is that three versions of the yaml could bring > > > >> confusion (we will have backward compatibility to the current one > for > > > some > > > >> time). But it might be only me. I am open for feedback > > > >> > > > >> > > > >>> If we can document this, it would be great as stuff >like “enabled” > > are > > > >>> inconsistent so not sure if I did it properly =D > > > >>> > > > >> Well, this is for now only in the ticket in the first version but no > > one > > > >> raised any concern. We will definitely have to update our docs on > this > > > and > > > >> whatever else we came to agreement on - both for users and > > contributors. > > > >> > > > >>> though I will agree that it can be hard for some >tools (such > > > >>> as bash templating), but feel we can always find a >common ground > > > >> Valid point and I believe it is one of the reasons we delayed the > > > ticket, > > > >> in order to get feedback on that. I am really interested to hear > what > > > >> concerns people might have. > > > >> > > > >> > > > >>> Opening up a 1500+ line .yaml file is very daunting, >even if most > of > > > it is > > > >>> comments. Can't blame folks for being >overwhelmed at the prospect > of > > > >> tuning > > > >>> Cassandra w/that as our operator config API. :) > > > >> I am all in for simplification and to make our users’ lives easier. > > But > > > at > > > >> this point we shouldn’t be comparing the length of the files I think > > as > > > the > > > >> comments were stripped only for the POC. I guess many of them will > get > > > back > > > >> in the actual doc version unfortunately. > > > >> > > > >> Thank you all, > > > >> Ekaterina > > > >> > > > >> On Thu, 2 Sep 2021 at 20:07, Joshua McKenzie <jmcken...@apache.org> > > > wrote: > > > >> > > > >>> Reading through the two, the grouping approach seems like it's a > lot > > > more > > > >>> friendly to newcomers as well as providing context specific cues > for > > > >>> relationships between params you're editing. Showing and not > telling, > > > if > > > >>> you will. > > > >>> > > > >>> Opening up a 1500+ line .yaml file is very daunting, even if most > of > > > it is > > > >>> comments. Can't blame folks for being overwhelmed at the prospect > of > > > tuning > > > >>> Cassandra w/that as our operator config API. :) > > > >>> > > > >>> ~Josh > > > >>> > > > >>> On Thu, Sep 2, 2021 at 1:48 PM David Capwell > > > <dcapw...@apple.com.invalid> > > > >>> wrote: > > > >>> > > > >>>> Thanks for bringing this back up; Caleb and I were talking about > the > > > lack > > > >>>> of clarity with regard to CASSANDRA-16896, fleshing this out would > > > make > > > >>>> those configs nicer! > > > >>>> > > > >>>>> To standardize naming - that we did by agreeing to the form > > noun_verb > > > >>>> > > > >>>> If we can document this, it would be great as stuff like “enabled” > > are > > > >>>> inconsistent so not sure if I did it properly =D > > > >>>> > > > >>>>> > > > >>>>> Provision of values with units while maintaining backward > > > >>>> compatibility. > > > >>>> > > > >>>> +1000000000000 > > > >>>> > > > >>>> I really hate local_read_size_threshold_kb; I would love > > > >>>> local_read_size_threshold: 10kb. Once we have the infrastructure > in > > > >>> place > > > >>>> (believe your patch before had these tools) I would love to > switch! > > > >>>> > > > >>>> > > > >>>>> Another proposal is done by Benedict; grouping the config > > parameters. > > > >>>> > > > >>>> Yep, this is what triggered Caleb and I to talk about this thread! > > To > > > >>>> group or not to group; that is the question > > > >>>> > > > >>>> Personally I like grouping from an organization point of view so > am > > in > > > >>>> favor of that; though I will agree that it can be hard for some > > tools > > > >>> (such > > > >>>> as bash templating), but feel we can always find a common ground > > > >>>> > > > >>>> > > > >>>>> On Sep 2, 2021, at 8:44 AM, bened...@apache.org wrote: > > > >>>>> > > > >>>>> Thanks for bringing this to the list Ekaterina! > > > >>>>> > > > >>>>> It’s worth noting that the two don’t have to be in conflict: we > > could > > > >>>> offer two template yaml with the parameters grouped differently, > for > > > >>> users > > > >>>> to decide for themselves. > > > >>>>> > > > >>>>> The proposals primarily define parameter names differently, with > my > > > >>>> proposal going by kind->place, and the other proposal maintaining > > > >>> (mostly) > > > >>>> the existing name form (which is a bit more like place->kind). > While > > > the > > > >>>> example yaml groups by kind, you can convert nested definitions > > into a > > > >>>> ‘dot’ form (e.g. limits.concurrency.reads) for use in a different > > > >>> grouping. > > > >>>>> > > > >>>>> One advantage of grouping parameters together is that it aids > > > >>>> maintaining coherency of naming between systems, and also > > potentially > > > >>>> permits a more succinct config file and better discovery. But it’s > > far > > > >>> from > > > >>>> a silver bullet, as value judgements have to be made about where > the > > > >>>> grouping lines are. I’m sure anything we settle on will be a huge > > > >>>> improvement over the status quo, however. > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> > > > >>>>> From: Ekaterina Dimitrova <e.dimitr...@gmail.com> > > > >>>>> Date: Thursday, 2 September 2021 at 16:32 > > > >>>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > > > >>>>> Subject: [DISCUSS] CASSANDRA-15234 > > > >>>>> Hi team, > > > >>>>> > > > >>>>> I would like to bring to the attention of the community > > > >>> CASSANDRA-15234, > > > >>>>> standardise config and JVM parameters. > > > >>>>> > > > >>>>> This is work we discussed back in Summer 2020 just before our > first > > > 4.0 > > > >>>>> Beta release. During the discussion we figured out that there is > > more > > > >>>> than > > > >>>>> one option to do the job and not enough time to get user feedback > > and > > > >>>>> finish it so this was delayed post-4.0 And here I am, bringing it > > > back > > > >>> to > > > >>>>> the table. > > > >>>>> > > > >>>>> This work’s goal is: > > > >>>>> > > > >>>>> - > > > >>>>> > > > >>>>> To standardize naming - that we did by agreeing to the form > > noun_verb > > > >>>>> - > > > >>>>> > > > >>>>> Provision of values with units while maintaining backward > > > >>>> compatibility. > > > >>>>> > > > >>>>> > > > >>>>> Those two parts are more or less already done. > > > >>>>> > > > >>>>> More interesting is the third part - reorganizing the > > cassandra.yaml > > > >>>> file. > > > >>>>> > > > >>>>> My personal approach was to split it into sections, done here > > > >>>>> < > > > >>>> > > > >>> > > > > > > https://github.com/ekaterinadimitrova2/cassandra/blob/b4eebe080835da79d032f9314262c268b71172a8/conf/cassandra.yaml > > > >>>>> > > > >>>>> . > > > >>>>> > > > >>>>> Another proposal is done by Benedict; grouping the config > > parameters. > > > >>>>> > > > >>>>> To make it clearer, he created a yaml > > > >>>>> < > > > >>>> > > > >>> > > > > > > https://github.com/belliottsmith/cassandra/blob/5f80d1c0d38873b7a27dc137656d8b81f8e6bbd7/conf/cassandra_nocomment.yaml > > > >>>>> > > > >>>>> with comments mostly stripped. > > > >>>>> > > > >>>>> In his version, there are basic settings for network, disk etc > all > > > >>>> grouped > > > >>>>> together, followed by operator tuneables mostly under limits > within > > > >>> which > > > >>>>> we now have throughput, concurrency, capacity. This leads to > > settings > > > >>> for > > > >>>>> some features being kept separate (most notably for caching), but > > > helps > > > >>>> the > > > >>>>> operator understand what they have to play with for controlling > > > >>> resource > > > >>>>> consumption. > > > >>>>> > > > >>>>> I am interested to hear what people think about the two options > or > > if > > > >>>>> anyone has another idea to share, open discussion. > > > >>>>> > > > >>>>> Thank you, > > > >>>>> > > > >>>>> Ekaterina > > > >>>> > > > >>>> > > > >>>> > > --------------------------------------------------------------------- > > > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > > > >>>> > > > >>>> > > > >>> > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > > > > > >