> choose a consistent, representative subset of stable tests that we feel give > us a reasonable level of confidence in return for a reasonable amount of > runtime > > ... > Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and j11 > w/wo vnodes. That is 6 times total. I wonder about that ROI. > ... > test with the default number of vnodes, test with the default compression > settings, and test with the default heap/off-heap buffers. > If I take these at face value to be true (I happen to agree with them, so I'm going to do this :)), what falls out for me: 1. Pre-commit should be an intentional smoke-testing suite, much smaller relative to post-commit than it is today 2. We should aggressively cull all low-signal pre-commit tests, suites, and configurations that aren't needed to keep post-commit stable High signal in pre-commit (indicative; non-exhaustive): 1. Only the most commonly used JDK (JDK11 atm?) 2. Config defaults (vnodes, compression, heap/off-heap buffers, memtable format, sstable format) 3. Most popular / general / run-of-the-mill linux distro (debian?) Low signal in pre-commit (indicative; non-exhaustive): 1. No vnodes 2. JDK8; JDK17 3. Non-default settings (Compression off. Fully mmap, no mmap. Trie memtables or sstables, cdc enabled)
So this shape of thinking - I'm curious what it triggers for you Brandon, Berenguer, Andres, Ekaterina, and Mick (when you're back from the mountains ;)). You guys paid a lot of the debt in the run up to 4.1 so have the most recent expertise and I trust your perspectives here. If a failure makes it to post-commit, it's much more expensive to root cause and figure out with much higher costs to the community's collective productivity. That said, I think we can make a lot of progress along this line of thinking. On Wed, Jul 5, 2023, at 5:54 AM, Jacek Lewandowski wrote: > Perhaps pre-commit checks should include mostly the typical configuration of > Cassandra rather than some subset of possible combinations. Like it was said > somewhere above - test with the default number of vnodes, test with the > default compression settings, and test with the default heap/off-heap > buffers. > > A longer-term goal could be to isolate what depends on particular > configuration options. Instead of blindly running everything with, say, > vnodes enabled and disabled, isolate those tests that need to be run with > those two configurations and run the rest with the default one. >> ... the rule of multiplexing new or changed tests might go a long way to >> mitigating that ... > > I wonder if there is some commonality in the flaky tests reported so far, > like the presence of certain statements? Also, there could be a tool that > inspects coverage analysis reports and chooses the proper tests to > run/multiplex because, in the end, we want to verify the changed production > code in addition to the modified test files. > > thanks, > Jacek > > śr., 5 lip 2023 o 06:28 Berenguer Blasi <berenguerbl...@gmail.com> napisał(a): >> Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and >> j11 w/wo vnodes. That is 6 times total. I wonder about that ROI. >> >> On dtest cluster reusage yes, I stopped that as at the time we had lots of >> CI changes, an upcoming release and priorities. But when the CI starts >> flexing it's muscles that'd be easy to pick up again as dtests code >> shouldn't have changed much. >> >> On 4/7/23 17:11, Derek Chen-Becker wrote: >>> Ultimately I think we have to invest in two directions: first, choose a >>> consistent, representative subset of stable tests that we feel give us a >>> reasonable level of confidence in return for a reasonable amount of >>> runtime. Second, we need to invest in figuring out why certain tests fail. >>> I strongly dislike the term "flaky" because it suggests that it's some >>> inconsequential issue causing problems. The truth is that a test that fails >>> is either a bug in the service code or a bug in the test. I've come to >>> realize that the CI and build framework is way too complex for me to be >>> able to help with much, but I would love to start chipping away at failing >>> test bugs. I'm getting settled into my new job and I should be able to >>> commit some regular time each week to triage and fixing starting in August, >>> and if there are any other folks who are interested let me know. >>> >>> Cheers, >>> >>> Derek >>> >>> On Mon, Jul 3, 2023, 12:30 PM Josh McKenzie <jmcken...@apache.org> wrote: >>>>> Instead of running all the tests through available CI agents every time >>>>> we can have presets of tests: >>>> Back when I joined the project in 2014, unit tests took ~ 5 minutes to run >>>> on a local machine. We had pre-commit and post-commit tests as a >>>> distinction as well, but also had flakes in the pre and post batch. I'd >>>> love to see us get back to a unit test regime like that. >>>> >>>> The challenge we've always had is flaky tests showing up in either the >>>> pre-commit or post-commit groups and difficulty in attribution on a flaky >>>> failure to where it was introduced (not to lay blame but to educate and >>>> learn and prevent recurrence). While historically further reduced smoke >>>> testing suites would just mean more flakes showing up downstream, the rule >>>> of multiplexing new or changed tests might go a long way to mitigating >>>> that. >>>> >>>>> Should we mention in this concept how we will build the sub-projects >>>>> (e.g. Accord) alongside Cassandra? >>>> I think it's an interesting question, but I also think there's no real >>>> dependency of process between primary mainline branches and feature >>>> branches. My intuition is that having the same bar (green CI, multiplex, >>>> don't introduce flakes, smart smoke suite tiering) would be a good idea on >>>> feature branches so there's not a death march right before merge, >>>> squashing flakes when you have to multiplex hundreds of tests before merge >>>> to mainline (since presumably a feature branch would impact a lot of >>>> tests). >>>> >>>> Now that I write that all out it does sound Painful. =/ >>>> >>>> On Mon, Jul 3, 2023, at 10:38 AM, Maxim Muzafarov wrote: >>>>> For me, the biggest benefit of keeping the build scripts and CI >>>>> configurations as well in the same project is that these files are >>>>> versioned in the same way as the main sources do. This ensures that we >>>>> can build past releases without having any annoying errors in the >>>>> scripts, so I would say that this is a pretty necessary change. >>>>> >>>>> I'd like to mention the approach that could work for the projects with >>>>> a huge amount of tests. Instead of running all the tests through >>>>> available CI agents every time we can have presets of tests: >>>>> - base tests (to make sure that your design basically works, the set >>>>> will not run longer than 30 min); >>>>> - pre-commit tests (the number of tests to make sure that we can >>>>> safely commit new changes and fit the run into the 1-2 hour build >>>>> timeframe); >>>>> - nightly builds (scheduled task to build everything we have once a >>>>> day and notify the ML if that build fails); >>>>> >>>>> >>>>> My question here is: >>>>> Should we mention in this concept how we will build the sub-projects >>>>> (e.g. Accord) alongside Cassandra? >>>>> >>>>> On Fri, 30 Jun 2023 at 23:19, Josh McKenzie <jmcken...@apache.org> wrote: >>>>> > >>>>> > Not everyone will have access to such resources, if all you have is 1 >>>>> > such pod you'll be waiting a long time (in theory one month, and you >>>>> > actually need a few bigger pods for some of the more extensive tests, >>>>> > e.g. large upgrade tests)…. >>>>> > >>>>> > One thing worth calling out: I believe we have a lot of low hanging >>>>> > fruit in the domain of "find long running tests and speed them up". >>>>> > Early 2022 I was poking around at our unit tests on CASSANDRA-17371 and >>>>> > found that 2.62% of our tests made up 20.4% of our runtime >>>>> > (https://docs.google.com/spreadsheets/d/1-tkH-hWBlEVInzMjLmJz4wABV6_mGs-2-NNM2XoVTcA/edit#gid=1501761592). >>>>> > This kind of finding is pretty consistent; I remember Carl Yeksigian >>>>> > at NGCC back in like 2015 axing an hour plus of aggregate runtime by >>>>> > just devoting an afternoon to looking at a few badly behaving tests. >>>>> > >>>>> > I'd like to see us move from "1 pod 1 month" down to something a lot >>>>> > more manageable. :) >>>>> > >>>>> > Shout-out to Berenger's work on CASSANDRA-16951 for dtest cluster reuse >>>>> > (not yet merged), and I have CASSANDRA-15196 to remove the CDC vs. non >>>>> > segment allocator distinction and axe the test-cdc target entirely. >>>>> > >>>>> > Ok. Enough of that. Don't want to derail us, just wanted to call out >>>>> > that the state of things today isn't the way it has to be. >>>>> > >>>>> > On Fri, Jun 30, 2023, at 4:41 PM, Mick Semb Wever wrote: >>>>> > >>>>> > - There are hw constraints, is there any approximation on how long it >>>>> > will take to run all tests? Or is there a stated goal that we will >>>>> > strive to reach as a project? >>>>> > >>>>> > Have to defer to Mick on this; I don't think the changes outlined here >>>>> > will materially change the runtime on our currently donated nodes in CI. >>>>> > >>>>> > >>>>> > >>>>> > A recent comparison between CircleCI and the jenkins code underneath >>>>> > ci-cassandra.a.o was done (not yet shared) to whether a 'repeatable CI' >>>>> > can be both lower cost and same turn around time. The exercise >>>>> > undercovered that there's a lot of waste in our jenkins builds, and >>>>> > once the jenkinsfile becomes standalone it can stash and unstash the >>>>> > build results. From this a conservative estimate was even if we only >>>>> > brought the build time to be double that of circleci it will still be >>>>> > significantly lower cost while still using on-demand ec2 instances. >>>>> > (The goal is to use spot instances.) >>>>> > >>>>> > The real problem here is that our CI pipeline uses ~1000 containers. >>>>> > ci-cassandra.a.o only has 100 executors (and a few of these at any time >>>>> > are often down for disk self-cleaning). The idea with 'repeatable >>>>> > CI', and to a broader extent Josh's opening email, is that no one will >>>>> > need to use ci-cassandra.a.o for pre-commit work anymore. For >>>>> > post-commit we don't care if it takes 7 hours (we care about stability >>>>> > of results, which 'repeatable CI' also helps us with). >>>>> > >>>>> > While pre-commit testing will be more accessible to everyone, it will >>>>> > still depend on the resources you have access to. For the fastest >>>>> > turn-around times you will need a k8s cluster that can spawn 1000 pods >>>>> > (4cpu, 8GB ram) which will run for up to 1-30 minutes, or the >>>>> > equivalent. Not everyone will have access to such resources, if all >>>>> > you have is 1 such pod you'll be waiting a long time (in theory one >>>>> > month, and you actually need a few bigger pods for some of the more >>>>> > extensive tests, e.g. large upgrade tests)…. >>>>> > >>>>> > >>>>> >>>>