Thanks Scott, this is awesome!
However, we should be careful when choosing what should be ValidatesRunner
and what should be NeedsRunner.
Could you briefly describe how you made the call and roughly what are the
statistics before/after your PR (number of tests in both categories)?

On Thu, May 3, 2018 at 9:18 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote:

> Thanks for the update Scott. That's really a great job.
>
> I will ping you on slack about some points as I'm preparing the build for
> the release (and I have some issues 😁).
>
> Thanks again
> Regards
> JB
> Le 3 mai 2018, à 17:54, Scott Wegner <sweg...@google.com> a écrit:
>>
>> Note: if you don't care about Java runner tests, you can stop reading now.
>>
>> tl;dr: I've made a pass over all @ValidatesRunner tests in pr/5218 [1]
>> and converted many to @NeedsRunner in order to reduce post-commit runtime.
>>
>> This is work that was long overdue and finally got my attention due to
>> the Gradle migration. As context, @ValidatesRunner [2] tests construct a
>> TestPipeline and exercise runner behavior through SDK constructs. The tests
>> are written runner-agnostic so that they can be run on and validate all
>> supported runners.
>>
>> The framework for these tests is great and writing them is super-easy.
>> But as a result, we have way too many of them-- over 250. These tests run
>> against all runners, and even when parallelized we see Dataflow post-commit
>> times exceeding 3-5 hours [3].
>>
>> When reading through these tests, we found many of them don't actually
>> exercise runner-specific behavior, and were simply using the TestPipeline
>> framework to validate SDK components. This is a valid pattern, but tests
>> should be annotated with @NeedsRunner instead. With this annotation, the
>> tests will run on only a single runner, currently DirectRunner.
>>
>> So, PR/5218 looks at all existing @ValidatesRunner tests and
>> conservatively converts tests which don't need to validate all runners into
>> @NeedsRunner. I've also sharded out some very large test classes into
>> scenario-based sub-classes. This is because Gradle parallelizes tests at
>> the class-level, and we found a couple very large test classes (ParDoTest)
>> became stragglers for the entire execution. Hopefully Gradle will soon
>> implement dynamic splitting :)
>>
>> So, the action I'd like to request from others:
>> 1) If you are an author of @ValidatesRunner tests, feel free to look over
>> the PR and let me know if I missed anything. Kenn Knowles is also helping
>> out here.
>> 2) If you find yourself writing new @ValidatesRunner tests, please
>> consider whether your test is validating runner-provided behavior. If not,
>> use @NeedsRunner instead.
>>
>>
>> [1] https://github.com/apache/beam/pull/5218
>> [2]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/ValidatesRunner.java
>>
>> [3]
>> https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/buildTimeTrend
>>
>>
>

Reply via email to