[ 
https://issues.apache.org/jira/browse/BEAM-2140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065478#comment-16065478
 ] 

Kenneth Knowles commented on BEAM-2140:
---------------------------------------

Ah, it is true that the input element is still pending. That is a very helpful 
perspective. So view it as a sort of NACK of the element as a whole, while 
saving the restriction to state. However, we can't advance the watermark as 
though it was non-splittable in the unbounded case; that would freeze the input 
watermark forever. (in the bounded case, I presume it is still the -inf to +inf 
story for all the same reasons as bounded sources)

Instead, perhaps the "amount consumed" of this element is measured by the 
watermark reporting done during processing of the element by the SDF. So the 
input watermark can move forward to that point. This would subsume output 
watermark holds, which seems nice.

I doubt think this kind of hold makes sense outside of SDF. Maybe, with revised 
semantics as per my final section above, but I don't think we should take on 
the additional challenge of designing this in a safe user-facing way in order 
to push SDF forwards.

On the internal side you do need a way to manage the watermarks in the 
underlying engine. We should note that {{ProcessFn}} already is treated 
specially via a {{ProcessFnRunner}} within a Flink {{SplittableDoFnOperator}}. 
So we are assuming explicit runner support and we are talking about how the 
{{SplittableDoFnOperator}} communicates with Flink. I would actually suggest a 
"partial NACK with new watermark" style of API (just like ProcessContinuation) 
so that it is tightly coupled with the fact that the element should be 
re-delivered. I would focus a lot on making stuck pipelines impossible, since 
they are hard to debug.

> Fix SplittableDoFn ValidatesRunner tests in FlinkRunner
> -------------------------------------------------------
>
>                 Key: BEAM-2140
>                 URL: https://issues.apache.org/jira/browse/BEAM-2140
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-flink
>            Reporter: Aljoscha Krettek
>            Assignee: Aljoscha Krettek
>
> As discovered as part of BEAM-1763, there is a failing SDF test. We disabled 
> the tests to unblock the open PR for BEAM-1763.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to