This sounds like a bug, as described.

Here's the logic, shared by all runners:
https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/ReduceFnRunner.java#L958

Regarding "race condition equivalent" I mean that when you have an early
trigger set up there is a benign race condition that determines the
presence or absence of an on time pane. So you should never count on it
unless you insist on it.

Regarding Robert's point I agree that if we try to interpret triggers as an
expression of human intent then it seems that AfterWatermark variants
should all output an ON_TIME pane. It is really the whole windowing
strategy by which the user specifies their intent. IMO there are a couple
reasons. Foremost, it becomes hard/impossible to reason about systems where
there are layers built upon the design philosophy of guessing intent. So it
is better to focus on clear boundaries; in this case triggers govern
whether the runner is permitted to produce non-lifecycle outputs, while
separate flags control the lifecycle outputs.

You can imagine a small DSL where the combination of trigger, on time
behavior, and closing behavior are expressed together. That is more or less
what WindowingStrategy is trying to get at. I also think the space of
triggers is sufficiently arbitrary and unexplored that I don't want to
couple them to things that are more obviously fundamental, like on time
behavior, lateness, and closing behavior. Previously, triggers were a
user-defined state machine, which I think is overly general and too
inefficient in a portable setting. But our particular set of triggers has
cruft you don't need and also sometimes lacks things you do need. If/when
we invest in and move toward sink triggers (aka specify business logic and
let the runner figure out the rest) seems good to re-open the design space.

Kenn

On Mon, Jan 13, 2020 at 7:10 PM Aaron Dixon <[email protected]> wrote:

> Using ClosingBehavior I was able to see all windows get final panes fired
> (w/ isLast=true).
>
> Still unsure why OnTimeBehavior=ALWAYS wasn't enough to ensure an on-time
> pane for all windows. Though not clear on the meaning of the 'race
> condition equivalency' you mentioned Kenneth. Am very interested in
> understanding how this is supposed to work in general - but am very happy
> that I was able to get final panes by setting
> ClosingBehavior=ALWAYS....thank you for that pro-tip!!
>
> On Mon, Jan 13, 2020 at 7:57 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> I think AfterWatermark in particular should *alway* produce an ON_TIME
>> pane, regardless of whether there were early panes. (It's less clear
>> with non-watermark triggers like after count or processing time.) This
>> makes it feel like the on time behavior is a property of the trigger,
>> not the windowing strategy.
>>
>> On Mon, Jan 13, 2020 at 5:06 PM Aaron Dixon <[email protected]> wrote:
>> >
>> > Kenn, thank you! There is OnTimeBehavior (default FIRE_ALWAYS) and
>> ClosingBehavior (default FIRE_IF_NON_EMPTY). Given that OnTimeBehavior is
>> always-fire, shouldn't I see empty ON_TIME panes?
>> >
>> > Since my lateness config is 0, I'm going to try ClosingBehavior =
>> FIRE_ALWAYS and see if I can rely on .isLast() to pick out the last pane
>> downstream. But curious if given that the OnTimeBehavior default is ALWAYS,
>> shouldn't I be seeing on-time panes in my current config?
>> >
>> >
>> >
>> > On Mon, Jan 13, 2020 at 6:45 PM Kenneth Knowles <[email protected]>
>> wrote:
>> >>
>> >> On my phone, so I can't grab the jira so easily, but quickly: EARLY
>> panes are "race condition equivalent" to ON_TIME panes. The early panes
>> consume all the pending elements then the on time pane is "empty". This is
>> WAI if it is what is causing it. You need to explicitly set
>> Window.configure().fireAlways()*. I know this is counterintuitive in
>> accumulating mode, where the empty pane is not the identity element.
>> >>
>> >> Kenn
>> >>
>> >> *I don't recall if this is the default or not, and also because on
>> phone it is slow to look up. From your experience I think not default.
>> >>
>> >> On Mon, Jan 13, 2020, 15:03 Aaron Dixon <[email protected]> wrote:
>> >>>
>> >>> Any confirmation on this from anyone? Whether per Beam spec, runners
>> are obligated to send ON_TIME panes for AfterWatermark triggers? I'm stuck
>> because this seems fundamental, so it's hard to imagine this is a Dataflow
>> bug, but OTOH it's also hard to imagine that trigger specs like
>> AfterWatermark are "optional"... ?
>> >>>
>> >>> On Mon, Jan 13, 2020 at 4:18 PM Aaron Dixon <[email protected]>
>> wrote:
>> >>>>
>> >>>> Yes. Using calendar day-based windows and watermark is completely
>> caught up to today ... calendar window ends several days ago. I got EARLY
>> panes for each element but never ON_TIME pane.
>> >>>>
>> >>>> On Mon, Jan 13, 2020 at 4:16 PM Luke Cwik <[email protected]> wrote:
>> >>>>>
>> >>>>> Is the watermark advancing past the end of the window?
>> >>>>>
>> >>>>> On Mon, Jan 13, 2020 at 2:02 PM Aaron Dixon <[email protected]>
>> wrote:
>> >>>>>>
>> >>>>>> The window is not empty fwiw; it has elements; I get an early
>> firing pane for the window but well after the watermark passes there is no
>> ON_TIME pane. Would this be a bug in Dataflow? Seems fundamental, so I'm
>> concerned perhaps the Beam spec doesn't obligate ON_TIME firings?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> On Mon, Jan 13, 2020 at 3:58 PM Luke Cwik <[email protected]>
>> wrote:
>> >>>>>>>
>> >>>>>>> I would have expected an empty on time pane since the default on
>> time behavior is FIRE_ALWAYS.
>> >>>>>>>
>> >>>>>>> On Mon, Jan 13, 2020 at 1:54 PM Aaron Dixon <[email protected]>
>> wrote:
>> >>>>>>>>
>> >>>>>>>> Can anyone confirm?
>> >>>>>>>>
>> >>>>>>>> This is intermittent. Some (it seems, sparse) windows don't get
>> an ON_TIME firing after watermark. Is this a bug or is there a reason to
>> not expect ON_TIME firings for every window?
>> >>>>>>>>
>> >>>>>>>> On Mon, Jan 13, 2020 at 3:47 PM Rui Wang <[email protected]>
>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> If it indeed happened as you have described, I will be very
>> interested in the expected behaviour.
>> >>>>>>>>>
>> >>>>>>>>> Something I remembered before: the trigger condition meets just
>> gives the runner/engine "permission" to fire, but runner/engine may not
>> fire immediately. But I don't know if the engine/runner will guarantee to
>> fire.
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> -Rui
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Jan 13, 2020 at 1:43 PM Aaron Dixon <[email protected]>
>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>> I have the following trigger:
>> >>>>>>>>>>
>> >>>>>>>>>> .apply(Window
>> >>>>>>>>>>       .configure()
>> >>>>>>>>>>       .triggering(AfterWatermark
>> >>>>>>>>>>            .pastEndOfWindow()
>> >>>>>>>>>>            .withEarlyFirings(AfterPane
>> >>>>>>>>>>                 .elementCountAtLeast(1)))
>> >>>>>>>>>>       .accumulatingFiredPanes()
>> >>>>>>>>>>       .withAllowedLateness(Duration.ZERO)
>> >>>>>>>>>>
>> >>>>>>>>>> But in Dataflow I notice that I never get an ON_TIME firing
>> for my window -- I only see early firing for elements, and then nothing.
>> >>>>>>>>>>
>> >>>>>>>>>> My assumption is that AfterWatermark should give me a last,
>> on-time pane under this configuration when the watermark surpasses the
>> window's end.
>> >>>>>>>>>>
>> >>>>>>>>>> Is my expectation correct?
>>
>

Reply via email to