Thanks for the thoughtful comments! I'll try and address them inline
below. I'm hoping to start a VOTE thread soon if there are no other
comments by the end of today.
On 10.09.20 15:40, David Anderson wrote:
Having just re-read FLIP-134, I think it mostly makes sense, though I'm not
exactly looking forward to figuring out how to explain it without making it
seem overly complicated.
Which are the points where you see the explanation could become to
complex? For me, the only difference in behaviour is processing-time
timers, which will fail hard in BATCH execution mode. Things like
shuffle-mode and schedule-mode should be transparent and I would not
mention them in the documentation except in an advanced section.
I'm a bit confused by the discussion around custom window Triggers. Yes, I
agree that complex, mixed Triggers are sometimes useful. And I buy into the
argument that we want to FAIL hard for processing-time on BATCH. But why
not go ahead and FAIL Triggers that can't work, rather than ignoring all
custom Triggers?
The motivation is to allow the same program to work on BATCH and on
STREAMING, and in reality DataStream programs often have Triggers that
you wouldn't need for BATCH execution.
I do think that this topic is too important to have it as a sub-section
in this FLIP. I will remove it and write another FLIP just about this
topic. This will mean that DataStream programs that have Triggers that
use processing-time will simply fail hard. Which is acceptable for an
initial version, I thin
I do think it's critical that bounded streaming has the same configuration
as unbounded streaming. Users expect/need things like processing time
timers in bounded streaming during development. If I've understood the
proposal correctly, this will be the case.
If you're referring to the case where you have STREAMING execution mode
but your sources are bounded (for development), then yes, I think we're
on the same page.
I would prefer WARN over IGNORE as the default for cases where users have
explicitly specified something that isn’t going to happen. (I would also
like to see a warning given for any job that uses event time timers without
having a watermark strategy, though that's unrelated to the topic at hand.)
Agreed, that's why I'm proposing pipeline.processing-time.allow: FAIL as
the default setting for BATCH execution mode. Is there another setting
where we currently propose IGNORE but you think it should be FAIL? There
is pipeline.processing-time.end-of-input: IGNORE, which is in line with
the current behaviour, and failing when timers are set means there won't
be any to fire in BATCH execution mode.
Aljoscha