Some context links for the benefit of the thread & archive: Beam issue mentioning a Jenkins plugin that caches on the Jenkins master: https://issues.apache.org/jira/browse/BEAM-4400 Beam's request to infra: https://issues.apache.org/jira/browse/INFRA-16630 Denied and reasoning on prior request: https://issues.apache.org/jira/browse/INFRA-16060
Because the Jenkins master / S3 are not good choices for where to cache. Hosting the actual Gradle build cache server as in the thread linked above would be different. Prototyping on the Beam ticket indicated a lack of success. (Ongoing) thread on builds@ asking about existing service: https://lists.apache.org/thread.html/ae40734e34dcf1d3bd8c65dfea3094709d9d8eb97bfb9ab92149e97c%40%3Cbuilds.apache.org%3E Kenn On Mon, Jul 13, 2020 at 10:26 AM Kenneth Knowles <k...@apache.org> wrote: > Having thought this over a bit, I think there are a few goals and they are > interfering with each other. > > 1. Clear signal for module / test suite health. This is a post-commit > concern. Post-commit jobs already all run as cronjobs with no > dependency-driven stuff. > 2. Making precommit test signal stay non-flaky as modules, tests, and > flakiness increase. > 3. Making precommit stay fast as modules, tests, and flakiness increase. > > Noting the interdependence of pre-commit and post-commit: > > - you can phrase trigger post-commit jobs > - pre-commit jobs are run as post-commits also > > Summarizing a bit: > > 1. Clear per-module/suite greenness and flakiness signal > - it would be nice if we could do this at the Gradle job level, but right > now it is Jenkins job > - on the other hand, most Gradle jobs do not represent a module so that > could be too fine-grained and Jenkins jobs are better > - if we have a ton of Jenkins jobs, we need some new automation or > amortized management > - don't want to overwhelm the Jenkins executors, especially not causing > precommit queueing > > 2. Making precommit stay non-flaky, robustly > - we can fix flakes, but can't count on that long term, but we could > build something that forces us to solve it at P0 > - we can add retry budget to tests where deflaking cannot be prioritized > - a lot of anxiety that testing less in pre-commit will cause painful > post-commit debugging > - a lot of overlap with making it faster, since the flakes are often > caused by irrelevant tests > > 3. Making precommit stay fast, robustly > - we could improve per-worker incremental build > - we could use a distributed build cache > - we have tasks that don't do their input/output correctly that will have > problems > > I care most about #1 and then also #2. The only reason I care about #3 is > because of #2: Once a pre-commit is more than a couple minutes, I always go > and do something else and come back in an hour or two. So if it flakes just > a few times, it costs a day. Fix #2 and I don't think #3 is urgent yet. > > A distributed build cache seems to be fairly low effort to set up and > makes #2 and #3 better and may unlock approaches to #1. If we can fix our > Gradle configs. We can ask ASF infra if they have something already or can > set it up. > > That will still leave open how to get better and more visible greenness > and flakiness signal at a more meaningful granularity. > > Kenn > > On Fri, Jul 10, 2020 at 6:38 AM Kenneth Knowles <k...@apache.org> wrote: > >> On Thu, Jul 9, 2020 at 1:44 PM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> I wonder how hard it would be to track greenness and flakiness at the >>> level of gradle project (or even lower), viewed hierarchically. >>> >> >> Looks like this is part of the Gradle Enterprise Tests Dashboard >> offering: https://gradle.com/blog/flaky-tests/ >> >> Kenn >> >> > Recall my (non-binding) starting point guessing at what tests should or >>> should not run in some scenarios: (this tangent is just about the third >>> one, where I explicitly said maybe we run all the same tests and then we >>> want to focus on separating signals as Luke pointed out) >>> > >>> > > - changing an IO or runner would not trigger the 20 minutes of core >>> SDK tests >>> > > - changing a runner would not trigger the long IO local integration >>> tests >>> > > - changing the core SDK could potentially not run as many tests in >>> presubmit, but maybe it would and they would be separately reported results >>> with clear flakiness signal >>> > >>> > And let's consider even more concrete examples: >>> > >>> > - when changing a Fn API proto, how important is it to run >>> RabbitMqIOTest? >>> > - when changing JdbcIO, how important is it to run the Java SDK >>> needsRunnerTests? RabbitMqIOTest? >>> > - when changing the FlinkRunner, how important is it to make sure >>> that Nexmark queries still match their models when run on direct runner? >>> > >>> > I chose these examples to all have zero value, of course. And I've >>> deliberately included an example of a core change and a leaf test. Not all >>> (core change, leaf test) pairs are equally important. The vast majority of >>> all tests we run are literally unable to be affected by the changes >>> triggering the test. So that's why enabling Gradle cache or using a plugin >>> like Brian found could help part of the issue, but not the whole issue, >>> again as Luke reminded. >>> >>> For (2) and (3), I would hope that the build dependency graph could >>> exclude them. You're right about (1) (and I've hit that countless >>> times), but would rather err on the side of accidentally running too >>> many tests than not enough. If we make manual edits to what can be >>> inferred by the build graph, let's make it a blacklist rather than an >>> allow list to avoid accidental lost coverage. >>> >>> > We make these tradeoffs all the time, of course, via putting some >>> tests in *IT and postCommit runs and some in *Test, implicitly preCommit. >>> But I am imagining a future where we can decouple the test suite >>> definitions (very stable, not depending on the project context) from the >>> decision of where and when to run them (less stable, changing as the >>> project changes). >>> > >>> > My assumption is that the project will only grow and all these >>> problems (flakiness, runtime, false coupling) will continue to get worse. I >>> raised this now so we could consider what is a steady state approach that >>> could scale, before it becomes an emergency. I take it as a given that it >>> is harder to change culture than it is to change infra/code, so I am not >>> considering any possibility of more attention to flaky tests or more >>> attention to testing the core properly or more attention to making tests >>> snappy or more careful consideration of *IT and *Test. (unless we build >>> infra that forces more attention to these things) >>> > >>> > Incidentally, SQL is not actually fully factored out. If you edit SQL >>> it runs a limited subset defined by :sqlPreCommit. If you edit core, then >>> :javaPreCommit still includes SQL tests. >>> >>> I think running SQL tests when you edit core is not actually that bad. >>> Possibly better than not running any of them. (Maybe, as cost becomes >>> more of a concern, adding the notion of "smoke tests" that are a cheap >>> subset run when upstream projects change would be a good compromise.) >>> >>