> I don't believe it warrants a CEP, speak up if you disagree. I agree with this but I'm also biased having been working w/you on this for a bit.
My instinct is that most folks on the project want CI that works consistently, quickly, and is minimally complex to modify. So the less disruptive and more well documented and streamlined we can make interacting with this process the better. On Mon, Jan 9, 2023, at 2:06 PM, Mick Semb Wever wrote: > Happy 2023 everyone! > > With only four months in front of us before the first 5.0 release I'm > hoping we can re-energize our focus on CI and Stable Trunk. > > This post covers the following > * Recap of CI improvements > * State of Affair > * The Butler (Build Lead) > * Proposal for a Repeatable Containerised CI > > and it calls for the following actions > ** we need you to sign up for a week's rotation as Build Lead ! > ** please reply in-thread any CI issues I've forgotten, > ** does CASSANDRA-18137 warrant a CEP? > > > *** Recap of CI improvements > > It's been over two years since my last CI Status post, with Adam and > Josh covering much of it in their general Status emails (which are > deeply appreciated). I'm hoping we can continue with both, given > their importance to a successful 5.0 release and the debt cost we face > otherwise going from the initial alpha release to the eventual GA. > > > We have made good efforts on moving towards a Stable Trunk. > Special mentions to > - improving parity between CircleCI and ci-cassandra.a.o (CASSANDRA-17930) > - introducing Butler and the Build Lead role > - pre-commit workflow, and automated multiplexing, in CircleCI > (CASSANDRA-16625) > - single digit flaky failures per build on 4.0, 4.1 and trunk > ci-cassandra.a.o !! > - CircleCI is as stable on Large as XLarge containers (CASSANDRA-18127) > > > *** State of Affair > > None of our CI systems are consistently green yet. Flakies occur in > both CircleCI and ci-cassandra.a.o . We had to lower the 4.1 release > CI criteria to accept three consequential green runs on CircleCI, as > it would have been unlikely to achieve the same on ci-cassandra.a.o. > While the flakey rate is lower than 4.0, the higher number of tests we > run is making it harder to get those green runs. > > Despite the overhead we continue to face with flakies and getting > major releases out, 4.1 saw fewer releases to GA than 4.0, I think all > will agree things are improving. But the challenge in front of us up > to the 5.0 release is huge with nine CEPs slated to land. Pre-commit > and post-commit CI needs investing in if we want our stable trunk > efforts to continue to improve. > > > *** The Butler (Build Lead) > > The introduction of Butler and the Build Lead was a wonderful > improvement to our CI efforts. It has brought a lot of hygiene in > listing out flakies as they happened. Noted that this has in-turn > increased the burden in getting our major releases out, but that's to > be seen as a one-off cost. This initiative lost traction and > volunteers mid last year. > > We really need you to take part in the Build Lead weekly rotation. > > I've signed myself up for this week, please jump in and sign yourself > up for the weeks ahead. If you are a coach/manager for a team, please > permit and encourage your engineers to be involved in this activity, > it shouldn't be more than an hour over the week. Further instructions > found at https://cwiki.apache.org/confluence/display/CASSANDRA/Build+Lead > > If it's your first time being a Build Lead the community is here to > help you, just reach out. It's also a great way into our community > for newcomers! > > When it comes to Butler it's UX of history is a bit clumsy. TIL that > you can indeed list the full history of failures per test, see 'Full > History' under a test page*. Please use this information to help > create jira tickets on flakies, specifically the versions it applies > to and the rough rate of failure so far observed. > > *) e.g. > https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-trunk/failure/snapshot_test/TestArchiveCommitlog/test_archive_commitlog_point_in_time_ln > > > *** Proposal for a Repeatable Containerised CI > > Building on what Josh writes in his "Cassandra project status, Year in > Review Holiday Edition" post, and many discussions offline with many > folk, I've written up the ticket epic for creating a reproducible > containerised ci-cassandra.a.o > > Please read https://issues.apache.org/jira/browse/CASSANDRA-18137 > > The tl;dr of it is to create a script that, using the jenkins k8s > operator, can set up a ci-cassandra.a.o clone in your k8s context. > > The ticket is lengthy, despite being in bullet form. I don't believe > it warrants a CEP, speak up if you disagree. The idea is to provide > us a turnkey solution: the jenkins k8s operator based script (create > ci-cassandra.a.o clone, run pipeline, save results, tear down clone); > to bring our existing build and test scripts (including their docker > images) from cassandra-builds to be in-tree to give us a declarative > jenkins pipeline that (in a simple intuitive manner) maps stages to > CI-agnostic build and test scripts (that can be run locally without a > CI system if you so desire), where all branch specific testing context > (jdks, pythons, dists) is defined outside of the CI code. Its success > depends upon providing a CI system that is stable and fast for > pre-commit testing. >