I recently came across TestPipelineOptions, and now I'm wondering if maybe
it should be deprecated. It only seems to actually be supported for Spark
and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it
may make more sense to move the functionality it provides into the tests
that need it.

TestPipelineOptions
<https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java>
currently has four attributes:

# TempRoot
It's purpose isn't documented, but many tests read TempRoot and use it to
set a TempLocation (example
<https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java#L124>).
I think this attribute makes sense (e.g. we can set TempRoot once and each
test has its own subdirectory), but I'm not sure. Can anyone confirm the
motivation for it? I'd like to at least add a docstring for it.

# OnCreateMatcher
A way to register a matcher that will be checked right after a pipeline has
started. It's never set except for in TestDataflowRunnerTest, so I think
this is absolutely safe to remove.

# OnSuccessMatcher
A way to register a matcher that will be checked right after a pipeline has
successfully completed. This is used in several tests (RequiresStableInputIT
<https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/core/src/test/java/org/apache/beam/sdk/RequiresStableInputIT.java#L140-L145>,
WordCountIT
<https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java#L66-L67>,
... 8 total occurrences), but I don't see why they couldn't all be replaced
with a `p.run().waitUntilFinish()`, followed by an assert.

I think the current approach is actually dangerous, because running these
tests with runners other than TestDataflowRunner or TestSparkRunner means
the matchers are never actually checked. This is actually how I came across
TestPipelineOptions - I tried running a test with the DirectRunner and
couldn't make it fail.

# TestTimeoutSeconds
Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used
in one place
<https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java#L116>.
I think it would be cleaner for the test to be responsible for calling
waitUntilFinish (which we do elsewhere), the only drawback is it requires a
small refactor so the test has access to the PipelineResult object.


So I have a couple of questions for the community
1) Are there thoughts on TempRoot? Can we get rid of it?
2) Are there any objections to removing the other three attributes? Am I
missing something? Unless there are any objections I think I'll write a
patch to remove them.

Thanks,
Brian

Reply via email to