There's an old joke: How many people read Slashdot? The answer is 5. The
rest of us just write comments without reading... In that spirit, I wanted
to share some thoughts in response to your question, even if I know some of
it will have been said in this thread already :-)

Basically, I just want to share what has worked well in my past projects...

Visualization: Now that we have Butler running, we can already see a
decline in failing tests for 4.0 and trunk! This shows that contributors
want to do the right thing, we just need the right tools and processes to
achieve success.

Process: I'm confident we will soon be back to seeing 0 failures for 4.0
and trunk. However, keeping that state requires constant vigilance! At
Mongodb we had a role called Build Baron (aka Build Cop, etc...). This is a
weekly rotating role where the person who is the Build Baron will at least
once per day go through all of the Butler dashboards to catch new
regressions early. We have used the same process also at Datastax to guard
our downstream fork of Cassandra 4.0. It's the responsibility of the Build
Baron to
 - file a jira ticket for new failures
 - determine which commit is responsible for introducing the regression.
Sometimes this is obvious, sometimes this requires "bisecting" by running
more builds e.g. between two nightly builds.
 - assign the jira ticket to the author of the commit that introduced the
regression

Given that Cassandra is a community that includes part time and volunteer
developers, we may want to try some variation of this, such as pairing 2
build barons each week?

Reverting: A policy that the commit causing the regression is automatically
reverted can be scary. It takes courage to be the junior test engineer who
reverts yesterday's commit from the founder and CTO, just to give an
example... Yet this is the most efficient way to keep the build green. And
it turns out it's not that much additional work for the original author to
fix the issue and then re-merge the patch.

Merge-train: For any project with more than 1 commit per day, it will
inevitably happen that you need to rebase a PR before merging, and even if
it passed all tests before, after rebase it won't. In the downstream
Cassandra fork previously mentioned, we have tried to enable a github rule
which requires a) that all tests passed before merging, and b) the PR is
against the head of the branch merged into, and c) the tests were run after
such rebase. Unfortunately this leads to infinite loops where a large PR
may never be able to commit because it has to be rebased again and again
when smaller PRs can merge faster. The solution to this problem is to have
an automated process for the rebase-test-merge cycle. Gitlab supports such
a feature and calls it merge-trean:
https://docs.gitlab.com/ee/ci/pipelines/merge_trains.html

The merge-train can be considered an advanced feature and we can return to
it later. The other points should be sufficient to keep a reasonably green
trunk.

I guess the major area where we can improve daily test coverage would be
performance tests. To that end we recently open sourced a nice tool that
can algorithmically detects performance regressions in a timeseries history
of benchmark results: https://github.com/datastax-labs/hunter Just like
with correctness testing it's my experience that catching regressions the
day they happened is much better than trying to do it at beta or rc time.

Piotr also blogged about Hunter when it was released:
https://medium.com/building-the-open-data-stack/detecting-performance-regressions-with-datastax-hunter-c22dc444aea4

henrik



On Sat, Oct 30, 2021 at 4:00 PM Joshua McKenzie <jmcken...@apache.org>
wrote:

> We as a project have gone back and forth on the topic of quality and the
> notion of a releasable trunk for quite a few years. If people are
> interested, I'd like to rekindle this discussion a bit and see if we're
> happy with where we are as a project or if we think there's steps we should
> take to change the quality bar going forward. The following questions have
> been rattling around for me for awhile:
>
> 1. How do we define what "releasable trunk" means? All reviewed by M
> committers? Passing N% of tests? Passing all tests plus some other metrics
> (manual testing, raising the number of reviewers, test coverage, usage in
> dev or QA environments, etc)? Something else entirely?
>
> 2. With a definition settled upon in #1, what steps, if any, do we need to
> take to get from where we are to having *and keeping* that releasable
> trunk? Anything to codify there?
>
> 3. What are the benefits of having a releasable trunk as defined here? What
> are the costs? Is it worth pursuing? What are the alternatives (for
> instance: a freeze before a release + stabilization focus by the community
> i.e. 4.0 push or the tock in tick-tock)?
>
> Given the large volumes of work coming down the pike with CEP's, this seems
> like a good time to at least check in on this topic as a community.
>
> Full disclosure: running face-first into 60+ failing tests on trunk when
> going through the commit process for denylisting this week brought this
> topic back up for me (reminds me of when I went to merge CDC back in 3.6
> and those test failures riled me up... I sense a pattern ;))
>
> Looking forward to hearing what people think.
>
> ~Josh
>


-- 

Henrik Ingo

+358 40 569 7354 <358405697354>

[image: Visit us online.] <https://www.datastax.com/>  [image: Visit us on
Twitter.] <https://twitter.com/DataStaxEng>  [image: Visit us on YouTube.]
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.youtube.com_channel_UCqA6zOSMpQ55vvguq4Y0jAg&d=DwMFaQ&c=adz96Xi0w1RHqtPMowiL2g&r=IFj3MdIKYLLXIUhYdUGB0cTzTlxyCb7_VUmICBaYilU&m=bmIfaie9O3fWJAu6lESvWj3HajV4VFwgwgVuKmxKZmE&s=16sY48_kvIb7sRQORknZrr3V8iLTfemFKbMVNZhdwgw&e=>
  [image: Visit my LinkedIn profile.] <https://www.linkedin.com/in/heingo/>

Reply via email to