As I work through the scripting on this, I don't know if we've documented
or clarified the following (don't see it here:
https://cassandra.apache.org/_/development/testing.html):

Pre-commit test suites:
* Which JDK's?
* When to include all python tests or do JVM only (if ever)?
* When to run upgrade tests?
* What to do if a test is also failing on the reference root (i.e. trunk,
cassandra-4.0, etc)?
* What to do if a test fails intermittently?

I'll also update the above linked documentation once we hammer this out and
try and bake it into the scripting flow as much as possible as well. Goal
is to make it easy to do the right thing and hard to do the wrong thing,
and to have these things written down rather than have it be tribal
knowledge that varies a lot across the project.

~Josh

On Sat, Dec 4, 2021 at 9:04 AM Joshua McKenzie <jmcken...@apache.org> wrote:

> After some offline collab, here's where this thread has landed on a
> proposal to change our processes to incrementally improve our processes and
> hopefully stabilize the state of CI longer term:
>
> Link:
> https://docs.google.com/document/d/1tJ-0K7d6PIStSbNFOfynXsD9RRDaMgqCu96U4O-RT84/edit#bookmark=id.16oxqq30bby4
>
> Hopefully the mail server doesn't butcher formatting; if it does, hit up
> the gdoc and leave comments there as should be open to all.
>
> Phase 1:
> Document merge criteria; update circle jobs to have a simple pre-merge job
> (one for each JDK profile)
>      * Donate, document, and formalize usage of circleci-enable.py in ASF
> repo (need new commit scripts / dev tooling section?)
>         * rewrites circle config jobs to simple clear flow
>         * ability to toggle between "run on push" or "click to run"
>         * Variety of other functionality; see below
> Document (site, help, README.md) and automate via scripting the
> relationship / dev / release process around:
>     * In-jvm dtest
>     * dtest
>     * ccm
> Integrate and document usage of script to build CI repeat test runs
>     * circleci-enable.py --repeat-unit org.apache.cassandra.SomeTest
>     * Document “Do this if you add or change tests”
> Introduce “Build Lead” role
>     * Weekly rotation; volunteer
>     * 1: Make sure JIRAs exist for test failures
>     * 2: Attempt to triage new test failures to root cause and assign out
>     * 3: Coordinate and drive to green board on trunk
> Change and automate process for *trunk only* patches:
>     * Block on green CI (from merge criteria in CI above; potentially
> stricter definition of "clean" for trunk CI)
>     * Consider using github PR’s to merge (TODO: determine how to handle
> circle + CHANGES; see below)
> Automate process for *multi-branch* merges
>     * Harden / contribute / document dcapwell script (has one which does
> the following):
>         * rebases your branch to the latest (if on 3.0 then rebase against
> cassandra-3.0)
>         * check compiles
>         * removes all changes to .circle (can opt-out for circleci patches)
>         * removes all changes to CHANGES.txt and leverages JIRA for the
> content
>         * checks code still compiles
>         * changes circle to run ci
>         * push to a temp branch in git and run CI (circle + Jenkins)
>             * when all branches are clean (waiting step is manual)
>             * TODO: Define “clean”
>                 * No new test failures compared to reference?
>                 * Or no test failures at all?
>             * merge changes into the actual branches
>             * merge up changes; rewriting diff
>             * push --atomic
>
> Transition to phase 2 when:
>     * All items from phase 1 are complete
>     * Test boards for supported branches are green
>
> Phase 2:
> * Add Harry to recurring run against trunk
> * Add Harry to release pipeline
> * Suite of perf tests against trunk recurring
>
>
>
> On Wed, Nov 17, 2021 at 1:42 PM Joshua McKenzie <jmcken...@apache.org>
> wrote:
>
>> Sorry for not catching that Benedict, you're absolutely right. So long as
>> we're using merge commits between branches I don't think auto-merging via
>> train or blocking on green CI are options via the tooling, and multi-branch
>> reverts will be something we should document very clearly should we even
>> choose to go that route (a lot of room to make mistakes there).
>>
>> It may not be a huge issue as we can expect the more disruptive changes
>> (i.e. potentially destabilizing) to be happening on trunk only, so perhaps
>> we can get away with slightly different workflows or policies based on
>> whether you're doing a multi-branch bugfix or a feature on trunk. Bears
>> thinking more deeply about.
>>
>> I'd also be game for revisiting our merge strategy. I don't see much
>> difference in labor between merging between branches vs. preparing separate
>> patches for an individual developer, however I'm sure there's maintenance
>> and integration implications there I'm not thinking of right now.
>>
>> On Wed, Nov 17, 2021 at 12:03 PM bened...@apache.org <bened...@apache.org>
>> wrote:
>>
>>> I raised this before, but to highlight it again: how do these approaches
>>> interface with our merge strategy?
>>>
>>> We might have to rebase several dependent merge commits and want to
>>> merge them atomically. So far as I know these tools don’t work
>>> fantastically in this scenario, but if I’m wrong that’s fantastic. If not,
>>> given how important these things are, should we consider revisiting our
>>> merge strategy?
>>>
>>> From: Joshua McKenzie <jmcken...@apache.org>
>>> Date: Wednesday, 17 November 2021 at 16:39
>>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
>>> Subject: Re: [DISCUSS] Releasable trunk and quality
>>> Thanks for the feedback and insight Henrik; it's valuable to hear how
>>> other
>>> large complex infra projects have tackled this problem set.
>>>
>>> To attempt to summarize, what I got from your email:
>>> [Phase one]
>>> 1) Build Barons: rotation where there's always someone active tying
>>> failures to changes and adding those failures to our ticketing system
>>> 2) Best effort process of "test breakers" being assigned tickets to fix
>>> the
>>> things their work broke
>>> 3) Moving to a culture where we regularly revert commits that break tests
>>> 4) Running tests before we merge changes
>>>
>>> [Phase two]
>>> 1) Suite of performance tests on a regular cadence against trunk
>>> (w/hunter
>>> or otherwise)
>>> 2) Integration w/ github merge-train pipelines
>>>
>>> That cover the highlights? I agree with these points as useful places for
>>> us to invest in as a project and I'll work on getting this into a gdoc
>>> for
>>> us to align on and discuss further this week.
>>>
>>> ~Josh
>>>
>>>
>>> On Wed, Nov 17, 2021 at 10:23 AM Henrik Ingo <henrik.i...@datastax.com>
>>> wrote:
>>>
>>> > There's an old joke: How many people read Slashdot? The answer is 5.
>>> The
>>> > rest of us just write comments without reading... In that spirit, I
>>> wanted
>>> > to share some thoughts in response to your question, even if I know
>>> some of
>>> > it will have been said in this thread already :-)
>>> >
>>> > Basically, I just want to share what has worked well in my past
>>> projects...
>>> >
>>> > Visualization: Now that we have Butler running, we can already see a
>>> > decline in failing tests for 4.0 and trunk! This shows that
>>> contributors
>>> > want to do the right thing, we just need the right tools and processes
>>> to
>>> > achieve success.
>>> >
>>> > Process: I'm confident we will soon be back to seeing 0 failures for
>>> 4.0
>>> > and trunk. However, keeping that state requires constant vigilance! At
>>> > Mongodb we had a role called Build Baron (aka Build Cop, etc...). This
>>> is a
>>> > weekly rotating role where the person who is the Build Baron will at
>>> least
>>> > once per day go through all of the Butler dashboards to catch new
>>> > regressions early. We have used the same process also at Datastax to
>>> guard
>>> > our downstream fork of Cassandra 4.0. It's the responsibility of the
>>> Build
>>> > Baron to
>>> >  - file a jira ticket for new failures
>>> >  - determine which commit is responsible for introducing the
>>> regression.
>>> > Sometimes this is obvious, sometimes this requires "bisecting" by
>>> running
>>> > more builds e.g. between two nightly builds.
>>> >  - assign the jira ticket to the author of the commit that introduced
>>> the
>>> > regression
>>> >
>>> > Given that Cassandra is a community that includes part time and
>>> volunteer
>>> > developers, we may want to try some variation of this, such as pairing
>>> 2
>>> > build barons each week?
>>> >
>>> > Reverting: A policy that the commit causing the regression is
>>> automatically
>>> > reverted can be scary. It takes courage to be the junior test engineer
>>> who
>>> > reverts yesterday's commit from the founder and CTO, just to give an
>>> > example... Yet this is the most efficient way to keep the build green.
>>> And
>>> > it turns out it's not that much additional work for the original
>>> author to
>>> > fix the issue and then re-merge the patch.
>>> >
>>> > Merge-train: For any project with more than 1 commit per day, it will
>>> > inevitably happen that you need to rebase a PR before merging, and
>>> even if
>>> > it passed all tests before, after rebase it won't. In the downstream
>>> > Cassandra fork previously mentioned, we have tried to enable a github
>>> rule
>>> > which requires a) that all tests passed before merging, and b) the PR
>>> is
>>> > against the head of the branch merged into, and c) the tests were run
>>> after
>>> > such rebase. Unfortunately this leads to infinite loops where a large
>>> PR
>>> > may never be able to commit because it has to be rebased again and
>>> again
>>> > when smaller PRs can merge faster. The solution to this problem is to
>>> have
>>> > an automated process for the rebase-test-merge cycle. Gitlab supports
>>> such
>>> > a feature and calls it merge-trean:
>>> > https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html
>>> >
>>> > The merge-train can be considered an advanced feature and we can
>>> return to
>>> > it later. The other points should be sufficient to keep a reasonably
>>> green
>>> > trunk.
>>> >
>>> > I guess the major area where we can improve daily test coverage would
>>> be
>>> > performance tests. To that end we recently open sourced a nice tool
>>> that
>>> > can algorithmically detects performance regressions in a timeseries
>>> history
>>> > of benchmark results: https://github.com/datastax-labs/hunter Just
>>> like
>>> > with correctness testing it's my experience that catching regressions
>>> the
>>> > day they happened is much better than trying to do it at beta or rc
>>> time.
>>> >
>>> > Piotr also blogged about Hunter when it was released:
>>> >
>>> >
>>> https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4
>>> >
>>> > henrik
>>> >
>>> >
>>> >
>>> > On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jmcken...@apache.org>
>>> > wrote:
>>> >
>>> > > We as a project have gone back and forth on the topic of quality and
>>> the
>>> > > notion of a releasable trunk for quite a few years. If people are
>>> > > interested, I'd like to rekindle this discussion a bit and see if
>>> we're
>>> > > happy with where we are as a project or if we think there's steps we
>>> > should
>>> > > take to change the quality bar going forward. The following questions
>>> > have
>>> > > been rattling around for me for awhile:
>>> > >
>>> > > 1. How do we define what "releasable trunk" means? All reviewed by M
>>> > > committers? Passing N% of tests? Passing all tests plus some other
>>> > metrics
>>> > > (manual testing, raising the number of reviewers, test coverage,
>>> usage in
>>> > > dev or QA environments, etc)? Something else entirely?
>>> > >
>>> > > 2. With a definition settled upon in #1, what steps, if any, do we
>>> need
>>> > to
>>> > > take to get from where we are to having *and keeping* that releasable
>>> > > trunk? Anything to codify there?
>>> > >
>>> > > 3. What are the benefits of having a releasable trunk as defined
>>> here?
>>> > What
>>> > > are the costs? Is it worth pursuing? What are the alternatives (for
>>> > > instance: a freeze before a release + stabilization focus by the
>>> > community
>>> > > i.e. 4.0 push or the tock in tick-tock)?
>>> > >
>>> > > Given the large volumes of work coming down the pike with CEP's, this
>>> > seems
>>> > > like a good time to at least check in on this topic as a community.
>>> > >
>>> > > Full disclosure: running face-first into 60+ failing tests on trunk
>>> when
>>> > > going through the commit process for denylisting this week brought
>>> this
>>> > > topic back up for me (reminds me of when I went to merge CDC back in
>>> 3.6
>>> > > and those test failures riled me up... I sense a pattern ;))
>>> > >
>>> > > Looking forward to hearing what people think.
>>> > >
>>> > > ~Josh
>>> > >
>>> >
>>> >
>>> > --
>>> >
>>> > Henrik Ingo
>>> >
>>> > +358 40 569 7354 <358405697354>
>>> >
>>> > [image: Visit us online.] <https://www.datastax.com/>  [image: Visit
>>> us on
>>> > Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on
>>> YouTube.]
>>> > <
>>> >
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=
>>> > >
>>> >   [image: Visit my LinkedIn profile.] <
>>> https://www.linkedin.com/in/heingo/
>>> > >
>>> >
>>>
>>

Reply via email to