Hi Greg,

I can see the point about enabling partial runs as a temporary measure to
fight flakiness, and it does carry some merit. In that case, though, we
should have an idea of what the desired end state is once we've stopped
relying on any temporary measures. Do you think we should aim to disable
merges without a full suite of passing CI runs (allowing for administrative
override in an emergency)? If so, what would the path be from our current
state to there? What can we do to ensure that we don't get stuck relying on
a once-temporary aid that becomes effectively permanent?

With partial builds, we also need to be careful to make sure to correctly
handle cross-module dependencies. A tweak to broker or client logic may
only affect files in one module and pass all tests for that module, but
have far-reaching consequences for Streams, Connect, and MM2. We probably
want to build awareness of this dependency graph into any partial CI logic
we add, but if we do opt for that, then this change would
disproportionately benefit downstream modules (Streams, Connect, MM2), and
have little to no benefit for upstream ones (clients and at least some core
modules).

With regards to faster iteration times--I agree that it would be nice if
our CI builds didn't take 2-3 hours, but people should already be making
sure that tests are running locally before they push changes (or, if they
really want, they can run tests locally after pushing changes). And if
rapid iteration is necessary, it's always (or at least for the foreseeable
future) going to be faster to run whatever specific tests or build tasks
you need to run locally, instead of pushing to GitHub and waiting for
Jenkins to check for you.

Finally, since there are a number of existing flaky tests on trunk, what
would the strategy be for handling those? Do we try to get to a green state
on a per-module basis (possibly with awareness of downstream modules) as
quickly as possible, and then selectively enable partial builds once we
feel confident that flakiness has been addressed?

Cheers,

Chris

On Wed, Jun 7, 2023 at 5:09 AM Gaurav Narula <ka...@gnarula.com> wrote:

> Hey Greg,
>
> Thanks for sharing this idea!
>
> The idea of building and testing a relevant subset of code certainly seems
> interesting.
>
> Perhaps this is a good fit for Bazel [1] where
> target-determinator [2] can be used to to find a subset of targets that
> have changed between two commits.
>
> Even without [2], Bazel builds can benefit immensely from distributing
> builds
> to a set of remote nodes [3] with support for caching previously built
> targets [4].
>
> We've seen a few other ASF projects adopt Bazel as well:
>
> * https://github.com/apache/rocketmq
> * https://github.com/apache/brpc
> * https://github.com/apache/trafficserver
> * https://github.com/apache/ws-axiom
>
> I wonder how the Kafka community feels about experimenting with Bazel and
> exploring if it helps us offer faster build times without compromising on
> the
> correctness of the targets that need to be built and tested?
>
> Thanks,
> Gaurav
>
> [1]: https://bazel.build
> [2]: https://github.com/bazel-contrib/target-determinator
> [3]: https://bazel.build/remote/rbe
> [4]: https://bazel.build/remote/caching
>
> On 2023/06/05 17:47:07 Greg Harris wrote:
> > Hey all,
> >
> > I've been working on test flakiness recently, and I've been trying to
> > come up with ways to tackle the issue top-down as well as bottom-up,
> > and I'm interested to hear your thoughts on an idea.
> >
> > In addition to the current full-suite runs, can we in parallel trigger
> > a smaller test run which has only a relevant subset of tests? For
> > example, if someone is working on one sub-module, the CI would only
> > run tests in that module.
> >
> > I think this would be more likely to pass than the full suite due to
> > the fewer tests failing probabilistically, and would improve the
> > signal-to-noise ratio of the summary pass/fail marker on GitHub. This
> > should also be shorter to execute than the full suite, allowing for
> > faster cycle-time than the current full suite encourages.
> >
> > This would also strengthen the incentive for contributors specializing
> > in a module to de-flake tests, as they are rewarded with a tangible
> > improvement within their area of the project. Currently, even the
> > modules with the most reliable tests receive consistent CI failures
> > from other less reliable modules.
> >
> > I believe this is possible, even if there isn't an off-the-shelf
> > solution for it. We can learn of the changed files via a git diff, map
> > that to modules containing those files, and then execute the tests
> > just for those modules with gradle. GitHub also permits showing
> > multiple "checks" so that we can emit both the full-suite and partial
> > test results.
> >
> > Thanks,
> > Greg
> >

Reply via email to