[ https://issues.apache.org/jira/browse/BEAM-849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890855#comment-15890855 ]
Eugene Kirpichov commented on BEAM-849: --------------------------------------- I disagree that "unbounded pipelines" can't finish successfully. - Dataflow runner supports draining of pipelines, which leads to successful termination. - It is possible to run a pipeline like Create.of(1, 2, 3) + ParDo(do nothing) using a streaming runner, and it should terminate rather than hang. - One might ask "why run such a pipeline with a streaming runner", but it makes a lot more sense if the ParDo is splittable. E.g. Create.of(filename) + ParDo(tail file) + ParDo(process records) could use the low-latency capabilities of a streaming runner, but successfully terminate when the file is somehow "finalized". As a more mundane example - tests in https://github.com/apache/beam/blob/master/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java should pass in streaming runners as well as batch runners. - "Unbounded pipeline" is in general not a Beam concept - we should have a batch/streaming-agnostic meaning of "finished" in "waitUntilFinished". I propose the one that Dataflow runner uses for deciding when drain is completed: "all watermarks have progressed to infinity". > Redesign PipelineResult API > --------------------------- > > Key: BEAM-849 > URL: https://issues.apache.org/jira/browse/BEAM-849 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Reporter: Pei He > > Current state: > Jira https://issues.apache.org/jira/browse/BEAM-443 addresses > waitUntilFinish() and cancel(). > However, there are additional work around PipelineResult: > need clearly defined contract and verification across all runners > need to revisit how to handle metrics/aggregators > need to be able to get logs -- This message was sent by Atlassian JIRA (v6.3.15#6346)