IIRC in Java it is forbidden to output an element with a timestamp outside
its current window. An exception is outputs from @FinishBundle, where the
output timestamp is required and the window is applied. TBH it seems more
of an artifact of a mismatch between the pre-windowing and post-windowing
worlds. Most of the time, mixing processing across windows is simply wrong.
But there are fears that calling @FinishBundle once per window would be a
performance problem. On the other hand, don't most correct implementations
have to separate processing for each window anyhow?

Anyhow I think the Java behavior is better, so window assignment happens
exactly and only at window transforms.

Kenn

On Wed, Jan 15, 2020 at 4:59 PM Ankur Goenka <[email protected]> wrote:

> The case where a plan vanilla value or a windowed value is emitted seems
> as expected as the user intent is honored without any surprises.
>
> If I understand correctly in the case when timestamp is changed then
> applying window function again can have unintended behavior in following
> cases
> * Custom windows: User code can be executed in unintended order.
> * User emit a windowed value in a previous transform: Timestamping the
> value in this case would overwrite the user assigned window in earlier step
> even when the actual timestamp is the same. Semantically, emitting an
> element or a timestamped value with the same timestamp should have the same
> behaviour.
>
> What do you think?
>
>
> On Wed, Jan 15, 2020 at 4:04 PM Robert Bradshaw <[email protected]>
> wrote:
>
>> If an element is emitted with a timestamp, the window assignment is
>> re-applied at that time. At least that's how it is in Python. You can
>> emit the full windowed value (accepted without checking...), a
>> timestamped value (in which case the window will be computed), or a
>> plain old element (in which case the window and timestamp will be
>> computed (really, propagated)).
>>
>> On Wed, Jan 15, 2020 at 3:51 PM Ankur Goenka <[email protected]> wrote:
>> >
>> > Yup, This might result in unintended behavior as timestamp is changed
>> after the window assignment as elements in windows do not have timestamp in
>> the window time range.
>> >
>> > Shall we start validating atleast one window assignment between
>> timestamp assignment and GBK/triggers to avoid unintended behaviors
>> mentioned above?
>> >
>> > On Wed, Jan 15, 2020 at 1:24 PM Luke Cwik <[email protected]> wrote:
>> >>
>> >> Window assignment happens at the point in the pipeline the WindowInto
>> transform was applied. So in this case the window would have been assigned
>> using the original timestamp.
>> >>
>> >> Grouping is by key and window.
>> >>
>> >> On Tue, Jan 14, 2020 at 7:30 PM Ankur Goenka <[email protected]>
>> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I am not sure about the effect of the order of element timestamp
>> change and window association has on a group by key.
>> >>> More specifically, what would be the behavior if we apply window ->
>> change element timestamp -> Group By key.
>> >>> I think we should always apply window function after changing the
>> timestamp of elements. Though this is neither checked nor a recommended
>> practice in Beam.
>> >>>
>> >>> Example pipeline would look like this:
>> >>>
>> >>>       def applyTimestamp(value):
>> >>>             return window.TimestampedValue((key, value),
>> int(time.time())
>> >>>
>> >>>         p \
>> >>>             | 'Create' >> beam.Create(range(0, 10)) \
>> >>>             | 'Fixed Window' >>
>> beam.WindowInto(window.FixedWindows(5)) \
>> >>>             | 'Apply Timestamp' >> beam.Map(applyTimestamp) \ #
>> Timestamp is changed after windowing and before GBK
>> >>>             | 'Group By Key' >> beam.GroupByKey() \
>> >>>             | 'Print' >> beam.Map(print)
>> >>>
>> >>> Thanks,
>> >>> Ankur
>>
>

Reply via email to