Thanks for the thoughtful comments! I'll try and address them inline below. I'm hoping to start a VOTE thread soon if there are no other comments by the end of today.

On 10.09.20 15:40, David Anderson wrote:
Having just re-read FLIP-134, I think it mostly makes sense, though I'm not
exactly looking forward to figuring out how to explain it without making it
seem overly complicated.

Which are the points where you see the explanation could become to complex? For me, the only difference in behaviour is processing-time timers, which will fail hard in BATCH execution mode. Things like shuffle-mode and schedule-mode should be transparent and I would not mention them in the documentation except in an advanced section.

I'm a bit confused by the discussion around custom window Triggers. Yes, I
agree that complex, mixed Triggers are sometimes useful. And I buy into the
argument that we want to FAIL hard for processing-time on BATCH. But why
not go ahead and FAIL Triggers that can't work, rather than ignoring all
custom Triggers?

The motivation is to allow the same program to work on BATCH and on STREAMING, and in reality DataStream programs often have Triggers that you wouldn't need for BATCH execution.

I do think that this topic is too important to have it as a sub-section in this FLIP. I will remove it and write another FLIP just about this topic. This will mean that DataStream programs that have Triggers that use processing-time will simply fail hard. Which is acceptable for an initial version, I thin
I do think it's critical that bounded streaming has the same configuration
as unbounded streaming. Users expect/need things like processing time
timers in bounded streaming during development. If I've understood the
proposal correctly, this will be the case.

If you're referring to the case where you have STREAMING execution mode but your sources are bounded (for development), then yes, I think we're on the same page.

I would prefer WARN over IGNORE as the default for cases where users have
explicitly specified something that isn’t going to happen. (I would also
like to see a warning given for any job that uses event time timers without
having a watermark strategy, though that's unrelated to the topic at hand.)

Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as the default setting for BATCH execution mode. Is there another setting where we currently propose IGNORE but you think it should be FAIL? There is pipeline.processing-time.end-of-input: IGNORE, which is in line with the current behaviour, and failing when timers are set means there won't be any to fire in BATCH execution mode.

Aljoscha

Reply via email to