Hi Josh, I think you probably mean something like buffering elements in a field on the DoFn, emitting batches as appropriate, and emitting the remainder in finishBundle.
Unfortunately there are two issues: - in the presence of windowing the DoFn might be invoked in different windows, so you'll garble the contents between windows - when data is streamed in small bundles, way smaller than batch size, the results might be unintuitive The solution to both is the State API which I am hard at work on. Then you buffer in state, which is per-window and cross-bundle, and output as appropriate, emitting the remainder from a callback invoked once the window has expired (exceeded the allowed lateness). Kenn On Mon, Nov 14, 2016 at 1:57 PM, Josh Cogan <jos...@google.com.invalid> wrote: > Hi Dev, > > After offline discussions with Gus, I'd like propose we include a Batcher > function into contrib/. This would be a DoFn that behaves like this: > > [1,2,3,4,5] -> Batcher(max_size=2) -> [[1,2],[3,4],[5]] > > Its simple code, but it also shows off that values can still be yielded > from finish_bundle(), and lots of people found it useful for the internal > Google version too. > > LMK what you think. Thanks! > > Josh > > -- > joshgc > :wq >