David,

Thanks for your thoughts!

> Indeed, they will be more likely to pass but the
> downside is that folks may start to only rely on that signal and commit
> without looking at the full test suite. This seems dangerous to me.

I completely agree with you that it is not desirable for committers to
become overly reliant on the smaller subset of tests. Instead, the
partial builds can be a new merge requirement, instead of a
replacement for the existing full-suite spot-check.
For PRs which fail the partial test run, committers won't need to
examine the full suite to know that the contributor needs to make
further changes, and can spend their attention on something else.
We can make this clear with a dev-list announcement to explain the
meaning and interpretation of the new build result. We can also call
for flakiness reduction contributions at the same time.

> I would rather focus on trying to address this first. If
> we can stabilize them, I wonder if we should also enforce a green build to
> merge.

The reason I'm interested in this change is because this project
already appears to adopt this policy of bottom-up flakiness reduction,
and I don't think it has been effective enough to be able to enforce
requiring a green build.
I think that it is better to improve the incentives for flakiness
reduction, wait for flakiness to improve, and then later enforce a
green build to merge. In that context, the partial builds are a
temporary change to help us get to the desired end-goal.

Thanks,
Greg

On Tue, Jun 6, 2023 at 2:51 AM David Jacot <dja...@confluent.io.invalid> wrote:
>
> Hey Greg,
>
> Thanks for bringing this up.
>
> I am not sure to understand the benefit of the parallele trigger of a
> subset of the tests. Indeed, they will be more likely to pass but the
> downside is that folks may start to only rely on that signal and commit
> without looking at the full test suite. This seems dangerous to me.
>
> However, I agree that we have an issue with our builds. We have way too
> many flaky tests. I would rather focus on trying to address this first. If
> we can stabilize them, I wonder if we should also enforce a green build to
> merge.
>
> Best,
> David
>
>
>
> On Mon, Jun 5, 2023 at 7:47 PM Greg Harris <greg.har...@aiven.io.invalid>
> wrote:
>
> > Hey all,
> >
> > I've been working on test flakiness recently, and I've been trying to
> > come up with ways to tackle the issue top-down as well as bottom-up,
> > and I'm interested to hear your thoughts on an idea.
> >
> > In addition to the current full-suite runs, can we in parallel trigger
> > a smaller test run which has only a relevant subset of tests? For
> > example, if someone is working on one sub-module, the CI would only
> > run tests in that module.
> >
> > I think this would be more likely to pass than the full suite due to
> > the fewer tests failing probabilistically, and would improve the
> > signal-to-noise ratio of the summary pass/fail marker on GitHub. This
> > should also be shorter to execute than the full suite, allowing for
> > faster cycle-time than the current full suite encourages.
> >
> > This would also strengthen the incentive for contributors specializing
> > in a module to de-flake tests, as they are rewarded with a tangible
> > improvement within their area of the project. Currently, even the
> > modules with the most reliable tests receive consistent CI failures
> > from other less reliable modules.
> >
> > I believe this is possible, even if there isn't an off-the-shelf
> > solution for it. We can learn of the changed files via a git diff, map
> > that to modules containing those files, and then execute the tests
> > just for those modules with gradle. GitHub also permits showing
> > multiple "checks" so that we can emit both the full-suite and partial
> > test results.
> >
> > Thanks,
> > Greg
> >

Reply via email to