Amit Sela created BEAM-1396: ------------------------------- Summary: GABWVOBDoFn expects grouped values to be ordered by their timestamp but there is no such guarantee Key: BEAM-1396 URL: https://issues.apache.org/jira/browse/BEAM-1396 Project: Beam Issue Type: Bug Components: sdk-java-core Reporter: Amit Sela Assignee: Kenneth Knowles
GABWVOBDoFn relies on the grouped values to be ordered by their timestamp but nothing in the SDK guarantees this: https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/GroupAlsoByWindowsViaOutputBufferDoFn.java#L86 If such a chunk of timestamped values will be processed out-of-order I assume we'd end up with an {{IllegalStateException}} thrown here: https://github.com/apache/beam/blob/master/runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryTimerInternals.java#L191 I suggest we go ahead and add sorting before processing the bundle in chunks - this might prove expensive in extreme cases where a very large bundle with very few keys is processed, but it seems that timestamp order is necessary. As for runners who provide order guarantee, since GABW is optional I don't see an issue here, though [~dhalp...@google.com] suggested we add a "shouldSort" flag. Also, probably worth creating a test for this, though it would prove difficult since we would have to preset the order which is the problem to begin with :-) -- This message was sent by Atlassian JIRA (v6.3.15#6346)