Re: Finer-grained test runs?

Kenneth Knowles Wed, 08 Jul 2020 21:23:17 -0700

I like your use of "ancestor" and "descendant". I will adopt it.


On Wed, Jul 8, 2020 at 4:53 PM Robert Bradshaw <[email protected]> wrote:

> On Wed, Jul 8, 2020 at 4:44 PM Luke Cwik <[email protected]> wrote:
> >
> > I'm not sure that breaking it up will be significantly faster since each
> module needs to build its ancestors and run tests of itself and all of its
> descendants which isn't a trivial amount of work. We have only so many
> executors and with the increased number of jobs, won't we just be waiting
> for queued jobs to start?



I think that depends on how many fewer tests we could run (or rerun)
> for the average PR. (It would also be nice if we could share build
> artifacts across executors (is there something like ccache for
> javac?), but maybe that's too far-fetched?)
>

Robert: The gradle cache should remain valid across runs, I think... my
latest understanding was that it was a robust up-to-date check (aka not
`make`). We may have messed this up, as I am not seeing as much caching as
I would expect nor as much as I see locally. We had to do some tweaking in
the maven days to put the .m2 directory outside of the realm wiped for each
new build. Maybe we are clobbering the Gradle cache too. That might
actually make most builds so fast we do not care about my proposal.

Luke: I am not sure if you are replying to my email or to Brian's.

If Brian's: it does not result in redundant build (if plugin works) since
it would be one Gradle build process. But it does do a full build if you
touch something at the root of the ancestry tree like core SDK or model. I
would like to avoid automatically testing descendants if we can, since
things like Nexmark and most IOs are not sensitive to the vast majority of
model or core SDK changes. Runners are borderline.

If mine: you could assume my proposal is like Brian's but with full
isolated Jenkins builds. This would be strictly worse, since it would add
redundant builds of ancestors. I am assuming that you always run a separate
Jenkins job for every descendant. Still, many modules have fewer
descendants. And they do not trigger all the way up to the root and down to
all descendants of the root.

>From a community perspective, extensions and IOs are the most likely use
case for newcomers. For the person who comes to add or improve FooIO, it is
not a good experience to hit a flake in RabbitMqIO or JdbcIO or
DataflowRunner or FlinkRunner flakes.

I think the plugin Brian mentioned is only a start. It would be even better
for each module to have an opt-in list of descendants to test on precommit.
This works well with a rollback-first strategy on post-commit. We can then
replay the PR while triggering the postcommits that failed.

> I agree that we would have better visibility though in github and also in
> Jenkins.
>
> I do have to say having to scroll through a huge number of github
> checks is not always an improvement.
>

+1 but OTOH the gradle scan is sometimes too fine grained or associates
logs oddly (I skip the Jenkins status page almost always)


> > Fixing flaky tests would help improve our test signal as well. Not many
> willing people here though but could be less work then building and
> maintaining so many different jobs.
>
> +1
>

I agree with fixing flakes, but I want to treat the occurrence and
resolution of flakiness as standard operations. Just as bug counts increase
continuously as a project grows, so will overall flakiness. Separating
flakiness signals will help to prioritize which flakes to address.

Kenn


> > On Wed, Jul 8, 2020 at 4:13 PM Kenneth Knowles <[email protected]> wrote:
> >>
> >> That's a good start. It is new enough and with few enough commits that
> I'd want to do some thorough experimentation. Our build is complex enough
> with a lot of ad hoc coding that we might end up maintaining whatever we
> choose...
> >>
> >> In my ideal scenario the list of "what else to test" would be manually
> editable, or even strictly opt-in. Automatically testing everything that
> might be affected quickly runs into scaling problems too. It could make
> sense in post-commit but less so in pre-commit.
> >>
> >> Kenn
> >>
> >> On Wed, Jul 8, 2020 at 3:50 PM Brian Hulette <[email protected]>
> wrote:
> >>>
> >>> > We could have one "test the things" Jenkins job if the underlying
> tool (Gradle) could resolve what needs to be run.
> >>>
> >>> I think this would be much better. Otherwise it seems our Jenkins
> definitions are just duplicating information that's already stored in the
> build.gradle files which seems error-prone, especially for tests validating
> combinations of artifacts. I did some quick searching and came across [1].
> It doesn't look like the project has had a lot of recent activity, but it
> claims to do what we need:
> >>>
> >>> > The plugin will generate new tasks on the root project for each task
> provided on the configuration with the following pattern
> ${taskName}ChangedModules.
> >>> > These generated tasks will run the changedModules task to get the
> list of changed modules and for each one will call the given task.
> >>>
> >>> Of course this would only really help us with java tests as gradle
> doesn't know much about the structure of dependencies within the python
> (and go?) SDK.
> >>>
> >>> Brian
> >>>
> >>> [1] https://github.com/ismaeldivita/change-tracker-plugin
> >>>
> >>> On Wed, Jul 8, 2020 at 3:29 PM Kenneth Knowles <[email protected]>
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> I wanted to start a discussion about getting finer grained test
> execution more focused on particular artifacts/modules. In particular, I
> want to gather the downsides and impossibilities. So I will make a proposal
> that people can disagree with easily.
> >>>>
> >>>> Context: job_PreCommit_Java is a monolithic job that...
> >>>>
> >>>>  - takes 40-50 minutes
> >>>>  - runs tests of maybe a bit under 100 modules
> >>>>  - executes over 10k tests
> >>>>  - runs on any change to model/, sdks/java/, runners/,
> examples/java/, examples/kotlin/, release/ (only exception is SQL)
> >>>>  - is pretty flaky (because it conflates so many independent test
> flakes, mostly runners and IOs)
> >>>>
> >>>> See a scan at
> https://scans.gradle.com/s/dnuo4o245d2fw/timeline?sort=longest
> >>>>
> >>>> Proposal: Eliminate monolithic job and break into finer-grained jobs
> that operate on two principles:
> >>>>
> >>>> 1. Test run should be focused on validating one artifact or a
> specific integration of other artifacts.
> >>>> 2. Test run should trigger only on things that could affect the
> validity of that artifact.
> >>>>
> >>>> For example, a starting point is to separate:
> >>>>
> >>>>  - core SDK
> >>>>  - runner helper libs
> >>>>  - each runner
> >>>>  - each extension
> >>>>  - each IO
> >>>>
> >>>> Benefits:
> >>>>
> >>>>  - changing an IO or runner would not trigger the 20 minutes of core
> SDK tests
> >>>>  - changing a runner would not trigger the long IO local integration
> tests
> >>>>  - changing the core SDK could potentially not run as many tests in
> presubmit, but maybe it would and they would be separately reported results
> with clear flakiness signal
> >>>>
> >>>> There are 72 build.gradle files under sdks/java/ and 30 under
> runners/. They don't all require a separate job. But still there are enough
> that it is worth automation. Does anyone know of what options we might
> have? It does not even have to be in Jenkins. We could have one "test the
> things" Jenkins job if the underlying tool (Gradle) could resolve what
> needs to be run. Caching is not sufficient in my experience.
> >>>>
> >>>> (there are other quick fix alternatives to shrinking this time, but I
> want to focus on bigger picture)
> >>>>
> >>>> Kenn
>

Re: Finer-grained test runs?

Reply via email to