On Thu, Jul 9, 2020 at 1:44 PM Robert Bradshaw <rober...@google.com> wrote:
> I wonder how hard it would be to track greenness and flakiness at the > level of gradle project (or even lower), viewed hierarchically. > Looks like this is part of the Gradle Enterprise Tests Dashboard offering: https://gradle.com/blog/flaky-tests/ Kenn > Recall my (non-binding) starting point guessing at what tests should or > should not run in some scenarios: (this tangent is just about the third > one, where I explicitly said maybe we run all the same tests and then we > want to focus on separating signals as Luke pointed out) > > > > > - changing an IO or runner would not trigger the 20 minutes of core > SDK tests > > > - changing a runner would not trigger the long IO local integration > tests > > > - changing the core SDK could potentially not run as many tests in > presubmit, but maybe it would and they would be separately reported results > with clear flakiness signal > > > > And let's consider even more concrete examples: > > > > - when changing a Fn API proto, how important is it to run > RabbitMqIOTest? > > - when changing JdbcIO, how important is it to run the Java SDK > needsRunnerTests? RabbitMqIOTest? > > - when changing the FlinkRunner, how important is it to make sure that > Nexmark queries still match their models when run on direct runner? > > > > I chose these examples to all have zero value, of course. And I've > deliberately included an example of a core change and a leaf test. Not all > (core change, leaf test) pairs are equally important. The vast majority of > all tests we run are literally unable to be affected by the changes > triggering the test. So that's why enabling Gradle cache or using a plugin > like Brian found could help part of the issue, but not the whole issue, > again as Luke reminded. > > For (2) and (3), I would hope that the build dependency graph could > exclude them. You're right about (1) (and I've hit that countless > times), but would rather err on the side of accidentally running too > many tests than not enough. If we make manual edits to what can be > inferred by the build graph, let's make it a blacklist rather than an > allow list to avoid accidental lost coverage. > > > We make these tradeoffs all the time, of course, via putting some tests > in *IT and postCommit runs and some in *Test, implicitly preCommit. But I > am imagining a future where we can decouple the test suite definitions > (very stable, not depending on the project context) from the decision of > where and when to run them (less stable, changing as the project changes). > > > > My assumption is that the project will only grow and all these problems > (flakiness, runtime, false coupling) will continue to get worse. I raised > this now so we could consider what is a steady state approach that could > scale, before it becomes an emergency. I take it as a given that it is > harder to change culture than it is to change infra/code, so I am not > considering any possibility of more attention to flaky tests or more > attention to testing the core properly or more attention to making tests > snappy or more careful consideration of *IT and *Test. (unless we build > infra that forces more attention to these things) > > > > Incidentally, SQL is not actually fully factored out. If you edit SQL it > runs a limited subset defined by :sqlPreCommit. If you edit core, then > :javaPreCommit still includes SQL tests. > > I think running SQL tests when you edit core is not actually that bad. > Possibly better than not running any of them. (Maybe, as cost becomes > more of a concern, adding the notion of "smoke tests" that are a cheap > subset run when upstream projects change would be a good compromise.) >