Good find. I think TestPipelineOptions is from very early days. It makes sense to me that these are all obsolete. Some guesses, though I haven't dug through commit history to confirm:
- TempRoot: a while ago TempLocation was optional, so I think this would provide a default for things like gcpTempLocation and stagingLocation - OnSuccessMatcher: for runners where pipeline used to not terminate in streaming mode. Now I think every runner can successfully waitUntilFinish. Also the current API for waitUntilFinish went through some evolutions around asynchrony so it wasn't always a good choice. - OnCreateMatcher: just for symmetry? I don't know - TestTimeoutSeconds: probably also for the asychrony/waitUntilfinish issue Kenn On Wed, Nov 6, 2019 at 12:19 PM Brian Hulette <[email protected]> wrote: > I recently came across TestPipelineOptions, and now I'm wondering if maybe > it should be deprecated. It only seems to actually be supported for Spark > and Dataflow (via TestSparkRunner and TestDataflowRunner), and I think it > may make more sense to move the functionality it provides into the tests > that need it. > > TestPipelineOptions > <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/testing/TestPipelineOptions.java> > currently has four attributes: > > # TempRoot > It's purpose isn't documented, but many tests read TempRoot and use it to > set a TempLocation (example > <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryToTableIT.java#L124>). > I think this attribute makes sense (e.g. we can set TempRoot once and each > test has its own subdirectory), but I'm not sure. Can anyone confirm the > motivation for it? I'd like to at least add a docstring for it. > > # OnCreateMatcher > A way to register a matcher that will be checked right after a pipeline > has started. It's never set except for in TestDataflowRunnerTest, so I > think this is absolutely safe to remove. > > # OnSuccessMatcher > A way to register a matcher that will be checked right after a pipeline > has successfully completed. This is used in several tests ( > RequiresStableInputIT > <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/sdks/java/core/src/test/java/org/apache/beam/sdk/RequiresStableInputIT.java#L140-L145>, > WordCountIT > <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java#L66-L67>, > ... 8 total occurrences), but I don't see why they couldn't all be replaced > with a `p.run().waitUntilFinish()`, followed by an assert. > > I think the current approach is actually dangerous, because running these > tests with runners other than TestDataflowRunner or TestSparkRunner means > the matchers are never actually checked. This is actually how I came across > TestPipelineOptions - I tried running a test with the DirectRunner and > couldn't make it fail. > > # TestTimeoutSeconds > Seems to just be a wrapper for `waitUntilFinish(duration)`, and only used > in one place > <https://github.com/apache/beam/blob/49ef06e7ad5e601e8a33bd319543c857bcd2744f/examples/java/src/test/java/org/apache/beam/examples/WindowedWordCountIT.java#L116>. > I think it would be cleaner for the test to be responsible for calling > waitUntilFinish (which we do elsewhere), the only drawback is it requires a > small refactor so the test has access to the PipelineResult object. > > > So I have a couple of questions for the community > 1) Are there thoughts on TempRoot? Can we get rid of it? > 2) Are there any objections to removing the other three attributes? Am I > missing something? Unless there are any objections I think I'll write a > patch to remove them. > > Thanks, > Brian >
