[ https://issues.apache.org/jira/browse/BEAM-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kenneth Knowles reassigned BEAM-3568: ------------------------------------- Assignee: Batkhuyag Batsaikhan (was: Kenneth Knowles) > Overlapping sessions with zero allowed lateness due to window expiry rules > -------------------------------------------------------------------------- > > Key: BEAM-3568 > URL: https://issues.apache.org/jira/browse/BEAM-3568 > Project: Beam > Issue Type: Bug > Components: beam-model, runner-core > Reporter: Kenneth Knowles > Assignee: Batkhuyag Batsaikhan > Priority: Major > > Consider this sequence, with session gap durations of 5: > - element arrives with timestamp 0, assigned to proto-window [0, 5) > - watermark advances to 6, emitting the session and discarding it > - element arrives with timestamp 3, assigned to proto-window [3, 8) so it is > not dropped as the window is not expired > - watermark advances to 8+, emitting that session > While "technically correct" according to spec, this seems undesirable. It was > introduced when late data dropping was tied to window expiry. I think either > dropping the second element or including it and emitting a merged window > would be OK. > In the case of sessions, we could just retain the window until it cannot > possibly merge with other non-expired data. Even with allowed lateness zero > this is double the gap duration. The window would be in an interesting state > where it would be expired and ineligible for further output but could still > merge and the greater window could be output. > The challenge is that sessions are just one kind of merging window - the > merging logic has to be assumed opaque. So we cannot simply reason about how > sessions work. The other, more drastic option, is to rethink how late data > dropping is defined for merging windows, particularly in the "proto-window" > phase. -- This message was sent by Atlassian JIRA (v7.6.3#76005)