Re: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Josh McKenzie Wed, 05 Jul 2023 05:25:12 -0700

> choose a consistent, representative subset of stable tests that we feel give 
> us a reasonable level of confidence in return for a reasonable amount of 
> runtime
> 
> ...
> Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and j11 
> w/wo vnodes. That is 6 times total. I wonder about that ROI.
> ...
> test with the default number of vnodes, test with the default compression 
> settings, and test with the default heap/off-heap buffers.
> 
If I take these at face value to be true (I happen to agree with them, so I'm 
going to do this :)), what falls out for me:
 1. Pre-commit should be an intentional smoke-testing suite, much smaller 
relative to post-commit than it is today
 2. We should aggressively cull all low-signal pre-commit tests, suites, and 
configurations that aren't needed to keep post-commit stable
High signal in pre-commit (indicative; non-exhaustive):
 1. Only the most commonly used JDK (JDK11 atm?)
 2. Config defaults (vnodes, compression, heap/off-heap buffers, memtable 
format, sstable format)
 3. Most popular / general / run-of-the-mill linux distro (debian?)
Low signal in pre-commit (indicative; non-exhaustive):
 1. No vnodes
 2. JDK8; JDK17
 3. Non-default settings (Compression off. Fully mmap, no mmap. Trie memtables 
or sstables, cdc enabled)


So this shape of thinking - I'm curious what it triggers for you Brandon, 
Berenguer, Andres, Ekaterina, and Mick (when you're back from the mountains 
;)). You guys paid a lot of the debt in the run up to 4.1 so have the most 
recent expertise and I trust your perspectives here.

If a failure makes it to post-commit, it's much more expensive to root cause 
and figure out with much higher costs to the community's collective 
productivity. That said, I think we can make a lot of progress along this line 
of thinking.

On Wed, Jul 5, 2023, at 5:54 AM, Jacek Lewandowski wrote:
> Perhaps pre-commit checks should include mostly the typical configuration of 
> Cassandra rather than some subset of possible combinations. Like it was said 
> somewhere above - test with the default number of vnodes, test with the 
> default compression settings, and test with the default heap/off-heap 
> buffers. 
> 
> A longer-term goal could be to isolate what depends on particular 
> configuration options. Instead of blindly running everything with, say, 
> vnodes enabled and disabled, isolate those tests that need to be run with 
> those two configurations and run the rest with the default one.
>> ... the rule of multiplexing new or changed tests might go a long way to 
>> mitigating that ...
> 
> I wonder if there is some commonality in the flaky tests reported so far, 
> like the presence of certain statements? Also, there could be a tool that 
> inspects coverage analysis reports and chooses the proper tests to 
> run/multiplex because, in the end, we want to verify the changed production 
> code in addition to the modified test files.
> 
> thanks,
> Jacek
> 
> śr., 5 lip 2023 o 06:28 Berenguer Blasi <berenguerbl...@gmail.com> napisał(a):
>> Currently a dtest is being ran in j8 w/wo vnodes , j8/j11 w/wo vnodes and 
>> j11 w/wo vnodes. That is 6 times total. I wonder about that ROI.
>> 
>> On dtest cluster reusage yes, I stopped that as at the time we had lots of 
>> CI changes, an upcoming release and priorities. But when the CI starts 
>> flexing it's muscles that'd be easy to pick up again as dtests code 
>> shouldn't have changed much.
>> 
>> On 4/7/23 17:11, Derek Chen-Becker wrote:
>>> Ultimately I think we have to invest in two directions: first, choose a 
>>> consistent, representative subset of stable tests that we feel give us a 
>>> reasonable level of confidence in return for a reasonable amount of 
>>> runtime. Second, we need to invest in figuring out why certain tests fail. 
>>> I strongly dislike the term "flaky" because it suggests that it's some 
>>> inconsequential issue causing problems. The truth is that a test that fails 
>>> is either a bug in the service code or a bug in the test. I've come to 
>>> realize that the CI and build framework is way too complex for me to be 
>>> able to help with much, but I would love to start chipping away at failing 
>>> test bugs. I'm getting settled into my new job and I should be able to 
>>> commit some regular time each week to triage and fixing starting in August, 
>>> and if there are any other folks who are interested let me know.
>>> 
>>> Cheers,
>>> 
>>> Derek
>>> 
>>> On Mon, Jul 3, 2023, 12:30 PM Josh McKenzie <jmcken...@apache.org> wrote:
>>>>> Instead of running all the tests through available CI agents every time 
>>>>> we can have presets of tests:
>>>> Back when I joined the project in 2014, unit tests took ~ 5 minutes to run 
>>>> on a local machine. We had pre-commit and post-commit tests as a 
>>>> distinction as well, but also had flakes in the pre and post batch. I'd 
>>>> love to see us get back to a unit test regime like that.
>>>> 
>>>> The challenge we've always had is flaky tests showing up in either the 
>>>> pre-commit or post-commit groups and difficulty in attribution on a flaky 
>>>> failure to where it was introduced (not to lay blame but to educate and 
>>>> learn and prevent recurrence). While historically further reduced smoke 
>>>> testing suites would just mean more flakes showing up downstream, the rule 
>>>> of multiplexing new or changed tests might go a long way to mitigating 
>>>> that.
>>>> 
>>>>> Should we mention in this concept how we will build the sub-projects 
>>>>> (e.g. Accord) alongside Cassandra?
>>>> I think it's an interesting question, but I also think there's no real 
>>>> dependency of process between primary mainline branches and feature 
>>>> branches. My intuition is that having the same bar (green CI, multiplex, 
>>>> don't introduce flakes, smart smoke suite tiering) would be a good idea on 
>>>> feature branches so there's not a death march right before merge, 
>>>> squashing flakes when you have to multiplex hundreds of tests before merge 
>>>> to mainline (since presumably a feature branch would impact a lot of 
>>>> tests).
>>>> 
>>>> Now that I write that all out it does sound Painful. =/
>>>> 
>>>> On Mon, Jul 3, 2023, at 10:38 AM, Maxim Muzafarov wrote:
>>>>> For me, the biggest benefit of keeping the build scripts and CI
>>>>> configurations as well in the same project is that these files are
>>>>> versioned in the same way as the main sources do. This ensures that we
>>>>> can build past releases without having any annoying errors in the
>>>>> scripts, so I would say that this is a pretty necessary change.
>>>>> 
>>>>> I'd like to mention the approach that could work for the projects with
>>>>> a huge amount of tests. Instead of running all the tests through
>>>>> available CI agents every time we can have presets of tests:
>>>>> - base tests (to make sure that your design basically works, the set
>>>>> will not run longer than 30 min);
>>>>> - pre-commit tests (the number of tests to make sure that we can
>>>>> safely commit new changes and fit the run into the 1-2 hour build
>>>>> timeframe);
>>>>> - nightly builds (scheduled task to build everything we have once a
>>>>> day and notify the ML if that build fails);
>>>>> 
>>>>> 
>>>>> My question here is:
>>>>> Should we mention in this concept how we will build the sub-projects
>>>>> (e.g. Accord) alongside Cassandra?
>>>>> 
>>>>> On Fri, 30 Jun 2023 at 23:19, Josh McKenzie <jmcken...@apache.org> wrote:
>>>>> >
>>>>> > Not everyone will have access to such resources, if all you have is 1 
>>>>> > such pod you'll be waiting a long time (in theory one month, and you 
>>>>> > actually need a few bigger pods for some of the more extensive tests, 
>>>>> > e.g. large upgrade tests)….
>>>>> >
>>>>> > One thing worth calling out: I believe we have a lot of low hanging 
>>>>> > fruit in the domain of "find long running tests and speed them up". 
>>>>> > Early 2022 I was poking around at our unit tests on CASSANDRA-17371 and 
>>>>> > found that 2.62% of our tests made up 20.4% of our runtime 
>>>>> > (https://docs.google.com/spreadsheets/d/1-tkH-hWBlEVInzMjLmJz4wABV6_mGs-2-NNM2XoVTcA/edit#gid=1501761592).
>>>>> >  This kind of finding is pretty consistent; I remember Carl Yeksigian 
>>>>> > at NGCC back in like 2015 axing an hour plus of aggregate runtime by 
>>>>> > just devoting an afternoon to looking at a few badly behaving tests.
>>>>> >
>>>>> > I'd like to see us move from "1 pod 1 month" down to something a lot 
>>>>> > more manageable. :)
>>>>> >
>>>>> > Shout-out to Berenger's work on CASSANDRA-16951 for dtest cluster reuse 
>>>>> > (not yet merged), and I have CASSANDRA-15196 to remove the CDC vs. non 
>>>>> > segment allocator distinction and axe the test-cdc target entirely.
>>>>> >
>>>>> > Ok. Enough of that. Don't want to derail us, just wanted to call out 
>>>>> > that the state of things today isn't the way it has to be.
>>>>> >
>>>>> > On Fri, Jun 30, 2023, at 4:41 PM, Mick Semb Wever wrote:
>>>>> >
>>>>> > - There are hw constraints, is there any approximation on how long it 
>>>>> > will take to run all tests? Or is there a stated goal that we will 
>>>>> > strive to reach as a project?
>>>>> >
>>>>> > Have to defer to Mick on this; I don't think the changes outlined here 
>>>>> > will materially change the runtime on our currently donated nodes in CI.
>>>>> >
>>>>> >
>>>>> >
>>>>> > A recent comparison between CircleCI and the jenkins code underneath 
>>>>> > ci-cassandra.a.o was done (not yet shared) to whether a 'repeatable CI' 
>>>>> > can be both lower cost and same turn around time.  The exercise 
>>>>> > undercovered that there's a lot of waste in our jenkins builds, and 
>>>>> > once the jenkinsfile becomes standalone it can stash and unstash the 
>>>>> > build results.  From this a conservative estimate was even if we only 
>>>>> > brought the build time to be double that of circleci it will still be 
>>>>> > significantly lower cost while still using on-demand ec2 instances. 
>>>>> > (The goal is to use spot instances.)
>>>>> >
>>>>> > The real problem here is that our CI pipeline uses ~1000 containers. 
>>>>> > ci-cassandra.a.o only has 100 executors (and a few of these at any time 
>>>>> > are often down for disk self-cleaning).   The idea with 'repeatable 
>>>>> > CI', and to a broader extent Josh's opening email, is that no one will 
>>>>> > need to use ci-cassandra.a.o for pre-commit work anymore.  For 
>>>>> > post-commit we don't care if it takes 7 hours (we care about stability 
>>>>> > of results, which 'repeatable CI' also helps us with).
>>>>> >
>>>>> > While pre-commit testing will be more accessible to everyone, it will 
>>>>> > still depend on the resources you have access to.  For the fastest 
>>>>> > turn-around times you will need a k8s cluster that can spawn 1000 pods 
>>>>> > (4cpu, 8GB ram) which will run for up to 1-30 minutes, or the 
>>>>> > equivalent.  Not everyone will have access to such resources, if all 
>>>>> > you have is 1 such pod you'll be waiting a long time (in theory one 
>>>>> > month, and you actually need a few bigger pods for some of the more 
>>>>> > extensive tests, e.g. large upgrade tests)….
>>>>> >
>>>>> >
>>>>> 
>>>>

Re: [DISCUSS] Formalizing requirements for pre-commit patches on new CI

Reply via email to