I do think minimal window duration is a meaningful concept for
WindowFns, but from the pragmatic perspective I would ask is it useful
enough to require all implementers of WindowFn to specify it (given
that a default value of 0 would not be very useful).

On Mon, Apr 26, 2021 at 10:05 AM Jan Lukavský <[email protected]> wrote:
>
> Hi Kenn,
>
> On 4/26/21 5:59 PM, Kenneth Knowles wrote:
>
> In +Reza Rokni's example of looping timers, it is necessary to "seed" each 
> key, for just the reason you say. The looping timer itself for a key should 
> be in the global window. The outputs of the looping timer are windowed.
>
> Yes, exactly.
>
>
> All that said, your example seems possibly easier if you are OK with no 
> output for windows with no data.
>
> The problem is actually not with windows with no data. But with windows 
> containing only droppable data. This "toy example" is interestingly much more 
> complex than I expected. Pretty much due to the reason, that there is no 
> access to watermark while processing elements. But yes, there are probably 
> more efficient ways to solve that, the best option would be to have access to 
> the input watermark (e.g. at the start of the bundle, that seems to be well 
> defined, though I understand there is some negative experience with that 
> approach). But I don't want to discuss the solutions, actually.
>
> My "motivating example" was merely a motivation for me to ask this question 
> (and possible one more about side inputs is to follow :)), but - giving all 
> examples and possible solutions aside, the question is - is a minimal 
> duration an intrinsic property of a WindowFn, or not? If yes, I think there 
> are reasons to include this property into the model. If no, then we can 
> discuss the reason why is it the case. I see the only problem with 
> data-driven windows, all other windows are time-based and as such, probably 
> carry this property. The data-driven WindowFns could have this property 
> defined as zero. This is not a super critical request, more of a 
> philosophical discussion.
>
>  Jan
>
> It sounds like you don't actually want to drop the data, yes? You want to 
> partition elements at some time X that is in the middle of some event time 
> interval. If I understand your chosen approach, you could buffer the element 
> w/ metadata and set the timer in @ProcessElement. It is no problem if the 
> timestamp of the timer has already passed. It will fire immediately then. In 
> the @OnTimer you output from the buffer. I think there may be more efficient 
> ways to achieve this output.
>
> Kenn
>
> On Thu, Apr 22, 2021 at 2:48 AM Jan Lukavský <[email protected]> wrote:
>>
>> Hi,
>>
>> I have come across a "problem" while implementing some toy Pipeline. I
>> would like to split input PCollection into two parts - droppable data
>> (delayed for more than allowed lateness from the end of the window) from
>> the rest. I will not go into details, as that is not relevant, the
>> problem is that I need to setup something like "looping timer" to be
>> able to create state for a window, even when there is no data, yet (to
>> be able to setup timer for the end of a window, to be able to recognize
>> droppable data). I would like the solution to be generic, so I would
>> like to "infer" the duration of the looping timer from the input
>> PCollection. What I would need is to know a _minimal guaranteed duration
>> of a window that a WindowFn can generate_. I would then setup the
>> looping timer to tick with interval of this minimal duration and that
>> would guarantee the timer will hit all the windows.
>>
>> I could try to infer this duration from the input windowing with some
>> hackish ways - e.g. using some "instanceof" approach, or by using the
>> WindowFn to generate set of windows for some fixed timestamp (without
>> data element) and then infer the time from maxTimestamp of the returned
>> windows. That would probably break for sliding windows, because the
>> result would be the duration of the slide, not the duration of the
>> window (at least when doing naive computation).
>>
>> It seems to me, that all WindowFns have such a minimal Duration -
>> obvious for Fixed Windows, but every other window type seems to have
>> such property (including Sessions - that is the gap duration). The only
>> problem would be with data-driven windows, but we don't have currently
>> strong support for these.
>>
>> The question is then - would it make sense to introduce
>> WindowFn.getMinimalWindowDuration() to the model? Default value could be
>> zero, which would mean such WindowFn would be unsupported in my
>> motivating example.
>>
>>   Jan
>>

Reply via email to