Hi,
I am not sure about the effect of the order of element timestamp change and
window association has on a group by key.
More specifically, what would be the behavior if we apply window -> change
element timestamp -> Group By key.
I think we should always apply window function after changing the timestamp
of elements. Though this is neither checked nor a recommended practice in
Beam.
Example pipeline would look like this:
def applyTimestamp(value):
return window.TimestampedValue((key, value), int(time.time())
p \
| 'Create' >> beam.Create(range(0, 10)) \
| 'Fixed Window' >> beam.WindowInto(window.FixedWindows(5)) \
| 'Apply Timestamp' >> beam.Map(applyTimestamp) \ # Timestamp
is changed after windowing and before GBK
| 'Group By Key' >> beam.GroupByKey() \
| 'Print' >> beam.Map(print)
Thanks,
Ankur