Re: Should WindowFn have a mininal Duration?

Reuven Lax Mon, 26 Apr 2021 11:28:31 -0700

I've often seen custom windowfns with no static minimum duration. e.g. a
common customization of sessions is to identify a specific "logout" event
to end the session.


On Mon, Apr 26, 2021 at 11:08 AM Robert Bradshaw <rober...@google.com>
wrote:

> I do think minimal window duration is a meaningful concept for
> WindowFns, but from the pragmatic perspective I would ask is it useful
> enough to require all implementers of WindowFn to specify it (given
> that a default value of 0 would not be very useful).
>
> On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský <je...@seznam.cz> wrote:
> >
> > Hi Kenn,
> >
> > On 4/26/21 5:59 PM, Kenneth Knowles wrote:
> >
> > In +Reza Rokni's example of looping timers, it is necessary to "seed"
> each key, for just the reason you say. The looping timer itself for a key
> should be in the global window. The outputs of the looping timer are
> windowed.
> >
> > Yes, exactly.
> >
> >
> > All that said, your example seems possibly easier if you are OK with no
> output for windows with no data.
> >
> > The problem is actually not with windows with no data. But with windows
> containing only droppable data. This "toy example" is interestingly much
> more complex than I expected. Pretty much due to the reason, that there is
> no access to watermark while processing elements. But yes, there are
> probably more efficient ways to solve that, the best option would be to
> have access to the input watermark (e.g. at the start of the bundle, that
> seems to be well defined, though I understand there is some negative
> experience with that approach). But I don't want to discuss the solutions,
> actually.
> >
> > My "motivating example" was merely a motivation for me to ask this
> question (and possible one more about side inputs is to follow :)), but -
> giving all examples and possible solutions aside, the question is - is a
> minimal duration an intrinsic property of a WindowFn, or not? If yes, I
> think there are reasons to include this property into the model. If no,
> then we can discuss the reason why is it the case. I see the only problem
> with data-driven windows, all other windows are time-based and as such,
> probably carry this property. The data-driven WindowFns could have this
> property defined as zero. This is not a super critical request, more of a
> philosophical discussion.
> >
> >  Jan
> >
> > It sounds like you don't actually want to drop the data, yes? You want
> to partition elements at some time X that is in the middle of some event
> time interval. If I understand your chosen approach, you could buffer the
> element w/ metadata and set the timer in @ProcessElement. It is no problem
> if the timestamp of the timer has already passed. It will fire immediately
> then. In the @OnTimer you output from the buffer. I think there may be more
> efficient ways to achieve this output.
> >
> > Kenn
> >
> > On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský <je...@seznam.cz> wrote:
> >>
> >> Hi,
> >>
> >> I have come across a "problem" while implementing some toy Pipeline. I
> >> would like to split input PCollection into two parts - droppable data
> >> (delayed for more than allowed lateness from the end of the window) from
> >> the rest. I will not go into details, as that is not relevant, the
> >> problem is that I need to setup something like "looping timer" to be
> >> able to create state for a window, even when there is no data, yet (to
> >> be able to setup timer for the end of a window, to be able to recognize
> >> droppable data). I would like the solution to be generic, so I would
> >> like to "infer" the duration of the looping timer from the input
> >> PCollection. What I would need is to know a _minimal guaranteed duration
> >> of a window that a WindowFn can generate_. I would then setup the
> >> looping timer to tick with interval of this minimal duration and that
> >> would guarantee the timer will hit all the windows.
> >>
> >> I could try to infer this duration from the input windowing with some
> >> hackish ways - e.g. using some "instanceof" approach, or by using the
> >> WindowFn to generate set of windows for some fixed timestamp (without
> >> data element) and then infer the time from maxTimestamp of the returned
> >> windows. That would probably break for sliding windows, because the
> >> result would be the duration of the slide, not the duration of the
> >> window (at least when doing naive computation).
> >>
> >> It seems to me, that all WindowFns have such a minimal Duration -
> >> obvious for Fixed Windows, but every other window type seems to have
> >> such property (including Sessions - that is the gap duration). The only
> >> problem would be with data-driven windows, but we don't have currently
> >> strong support for these.
> >>
> >> The question is then - would it make sense to introduce
> >> WindowFn.getMinimalWindowDuration() to the model? Default value could be
> >> zero, which would mean such WindowFn would be unsupported in my
> >> motivating example.
> >>
> >>   Jan
> >>
>

Re: Should WindowFn have a mininal Duration?

Reply via email to