The JIRA issue https://issues.apache.org/jira/browse/BEAM-1283
suggests requiring an explicit Window when emitting from finshBundle.
I'm starting a thread because JIRA/GitHub probably isn't the best (or
most efficient) place to have this discussion.

The original Spec requires the ambient WindowFn to be a
non-element-inspecting, non-timestamp-inspecting Fn (currently, only
the GlobalWindowsFn) and at the time this was chosen (after much
deliberation) over requiring a WindowedValue or other options because
batching in batch mode was a very commonly used pattern (and it should
be noted that patterns of using state and/or a window expiry callback
doesn't address this case well due to the lack of key with which to
store state and distant (if ever) expiration of the window).

The downside, of course, is that trying to use such a DoFn in a
windowed context will not be caught until runtime. The proposal to
emit WindowedValues has exactly the same downside, but the runtime
error obtained when explicitly emitting a GlobalWindow when the
ambient WindowFn is different will likely be much less clear (an
encoding exception in the best case, silent data corruption in the
worse). This also requires more boilerplate on the part of the author,
and doesn't solve the enumerated issues of limiting which WindowFns
can be used, choosing a timestamp, or bogus proto windows.

Ideally we could catch such an error at pipeline construction time (or
earlier) but there have been no proposals along this line yet.
However, this is a stable-release-blocker, so we should come up with a
(temporary at least) course of action soon. At the very least it seems
we should accept emitting a Timestamped value as well, to which most
WindowFns can be applied.

- Robert

Reply via email to