On Thu, Jul 9, 2020 at 1:44 PM Robert Bradshaw <rober...@google.com> wrote:

> I wonder how hard it would be to track greenness and flakiness at the
> level of gradle project (or even lower), viewed hierarchically.
>

Looks like this is part of the Gradle Enterprise Tests Dashboard offering:
https://gradle.com/blog/flaky-tests/

Kenn

> Recall my (non-binding) starting point guessing at what tests should or
> should not run in some scenarios: (this tangent is just about the third
> one, where I explicitly said maybe we run all the same tests and then we
> want to focus on separating signals as Luke pointed out)
> >
> > > - changing an IO or runner would not trigger the 20 minutes of core
> SDK tests
> > > - changing a runner would not trigger the long IO local integration
> tests
> > > - changing the core SDK could potentially not run as many tests in
> presubmit, but maybe it would and they would be separately reported results
> with clear flakiness signal
> >
> > And let's consider even more concrete examples:
> >
> >  - when changing a Fn API proto, how important is it to run
> RabbitMqIOTest?
> >  - when changing JdbcIO, how important is it to run the Java SDK
> needsRunnerTests? RabbitMqIOTest?
> >  - when changing the FlinkRunner, how important is it to make sure that
> Nexmark queries still match their models when run on direct runner?
> >
> > I chose these examples to all have zero value, of course. And I've
> deliberately included an example of a core change and a leaf test. Not all
> (core change, leaf test) pairs are equally important. The vast majority of
> all tests we run are literally unable to be affected by the changes
> triggering the test. So that's why enabling Gradle cache or using a plugin
> like Brian found could help part of the issue, but not the whole issue,
> again as Luke reminded.
>
> For (2) and (3), I would hope that the build dependency graph could
> exclude them. You're right about (1) (and I've hit that countless
> times), but would rather err on the side of accidentally running too
> many tests than not enough. If we make manual edits to what can be
> inferred by the build graph, let's make it a blacklist rather than an
> allow list to avoid accidental lost coverage.
>
> > We make these tradeoffs all the time, of course, via putting some tests
> in *IT and postCommit runs and some in *Test, implicitly preCommit. But I
> am imagining a future where we can decouple the test suite definitions
> (very stable, not depending on the project context) from the decision of
> where and when to run them (less stable, changing as the project changes).
> >
> > My assumption is that the project will only grow and all these problems
> (flakiness, runtime, false coupling) will continue to get worse. I raised
> this now so we could consider what is a steady state approach that could
> scale, before it becomes an emergency. I take it as a given that it is
> harder to change culture than it is to change infra/code, so I am not
> considering any possibility of more attention to flaky tests or more
> attention to testing the core properly or more attention to making tests
> snappy or more careful consideration of *IT and *Test. (unless we build
> infra that forces more attention to these things)
> >
> > Incidentally, SQL is not actually fully factored out. If you edit SQL it
> runs a limited subset defined by :sqlPreCommit. If you edit core, then
> :javaPreCommit still includes SQL tests.
>
> I think running SQL tests when you edit core is not actually that bad.
> Possibly better than not running any of them. (Maybe, as cost becomes
> more of a concern, adding the notion of "smoke tests" that are a cheap
> subset run when upstream projects change would be a good compromise.)
>

Reply via email to