On Fri, May 5, 2017 at 12:43 PM, Robert Bradshaw < rober...@google.com.invalid> wrote:
> On Fri, May 5, 2017 at 12:14 PM, Kenneth Knowles <k...@google.com.invalid> > wrote: > > In Java, we _can_ now enforce this statically, thanks to Thomas G's > > improvements. We could either force it to be globally windowed or - since > > it can't inspect the element or timestamp - just run a couple test calls > > through "assignWindows", optionally confirming equality of the results. > > Could you clarify more? I'd like to enforce this statically, but I > don't see how to detect a particular DoFn emits a globally windowed > value. But that make things much better. > Currently, FinishBundleContext has just the outputWindowedValue method. To maintain that, safely, we would add GloballyWindowedOutput or some such with the output() and outputTimestamp() method. This is routine work quite analogous to what we already have. We would validate that the output windowing strategy is global windows if the parameter is present. > We > > can also enforce that the correct subclass of window is output, though > this > > is not implemented as such right now. > > Are you thinking a type check based on the reflectively obtained > parameter of the WindowFn? > Yes. Though we don't do it today, it would be OutputWithWindow<W> or some such (which is actually the same as FinishBundleContext<W>) where W was some window class that would be checked against the window coder. We cannot use the same inner class trick to avoid referencing the type, but that shouldn't be burdensome in this case. Kenn > But this behavior is all to support people who want to write transforms > > that work with essentially only one windowing strategy, basically > > pre-windowing stuff. Are there really any meaningful alternatives to the > > global window? If so, I don't think that is realistically what this is > > about. Such transforms should be permitted, but not encouraged or > supported > > automatically, as they are antithetical to the unified model. So I would > > also suggest that they should validate that their input is globally > > windowed and explicitly output to the global window. Trivial and they > > already should be doing that. They also have the capability of storing > the > > input's WindowFn and producing behavior identical to what you describe. > > What it boils down to is that this is a common case that only works > easily in the Global window. Another example that I can think of like > this in the SDK, namely global combine. If you're in the global > window, you get a singleton PCollection, and we don't require users to > do extra for this behavior. If someone tries to use this transform in > a non-globally-windowed setting, only then do we give an error (this > time at pipeline construction) and force the user to do extra work. > It's a balance between having something that works correctly in the > unified model but requires extra (potentially tricky and error-prone) > effort only iff one needs it. > > > For a transform to output in FinishBundle and be window-agnostic, the > local > > state of the DoFn needs to partition the windows manually, consider what > > timestamp it wants to use, and output the correct thing to the correct > > timestamp and window. I believe that having only the ability to > > outputWindowed(value, timestamp, window) makes it quite obvious that this > > is necessary. It is not boilerplate to do so, but core functionality. > > Yes, and we should be providing a BundlingDoFn to do this for users. > > > On Fri, May 5, 2017 at 11:41 AM, Robert Bradshaw < > > rober...@google.com.invalid> wrote: > > > >> The JIRA issue https://issues.apache.org/jira/browse/BEAM-1283 > >> suggests requiring an explicit Window when emitting from finshBundle. > >> I'm starting a thread because JIRA/GitHub probably isn't the best (or > >> most efficient) place to have this discussion. > >> > >> The original Spec requires the ambient WindowFn to be a > >> non-element-inspecting, non-timestamp-inspecting Fn (currently, only > >> the GlobalWindowsFn) and at the time this was chosen (after much > >> deliberation) over requiring a WindowedValue or other options because > >> batching in batch mode was a very commonly used pattern (and it should > >> be noted that patterns of using state and/or a window expiry callback > >> doesn't address this case well due to the lack of key with which to > >> store state and distant (if ever) expiration of the window). > >> > >> The downside, of course, is that trying to use such a DoFn in a > >> windowed context will not be caught until runtime. The proposal to > >> emit WindowedValues has exactly the same downside, but the runtime > >> error obtained when explicitly emitting a GlobalWindow when the > >> ambient WindowFn is different will likely be much less clear (an > >> encoding exception in the best case, silent data corruption in the > >> worse). This also requires more boilerplate on the part of the author, > >> and doesn't solve the enumerated issues of limiting which WindowFns > >> can be used, choosing a timestamp, or bogus proto windows. > >> > >> Ideally we could catch such an error at pipeline construction time (or > >> earlier) but there have been no proposals along this line yet. > >> However, this is a stable-release-blocker, so we should come up with a > >> (temporary at least) course of action soon. At the very least it seems > >> we should accept emitting a Timestamped value as well, to which most > >> WindowFns can be applied. > >> > >> - Robert > >> >