This ticket is still relevant. Response inline. On Tue, Jan 17, 2017 at 8:52 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> OK, but I'm afraid we would be too specific. If the batching is just a > List<T> or Set<T> that we populate in @ProcessElement and flush in > @FinishBundle (or when we raise a limit), I don't know if it brings lot of > value. > This approach was common before windowing and streaming, but has some downsides in the full unified model: - Requires a buffer per window, so `Map<WindowT, Iterable<T>>` - That means you also need some eviction plan - Bundles may be very small so you actually want to buffer across them. Requires state API - Placing output in a different window from input requires shenanigans I won't go into So I think the ticket it still relevant to reproduce the pattern JB is describing in the full model. The details that Ben brings up still need to be considered for the full transform. Perhaps we can continue the details and implementation details on the ticket, so it is all easy to look up later, etc. Kenn > People might wants a more fine-grained logic (based on the type in the > list for instance). > > I'm not against, just try to think about a generic way to do that. > > Regards > JB > > > On 01/17/2017 08:48 AM, Etienne Chauchot wrote: > >> Hi JB, >> >> I meant jira vote but discussion on the ML works also :) >> >> As I understand the need (see stackoverflow links in jira ticket) the >> aim is to avoid the user having to code the batching logic in his own >> DoFn.processElement() and DoFn.finishBundle() regardless of the bundles. >> For example, possible use case is to batch a call to an external service >> (for performance). >> >> I was thinking about providing a PTransform that implements the batching >> in its own DoFn and that takes user defined functions for customization. >> >> Etienne >> >> Le 17/01/2017 à 17:30, Jean-Baptiste Onofré a écrit : >> >>> Hi >>> >>> I guess you mean discussion on the mailing list about that, right ? >>> >>> AFAIR the idea is to provide a utility class to deal with >>> pooling/batching. However not sure it's required as with @StartBundle >>> etc in DoFn and batching depends of the end user "logic". >>> >>> Regards >>> JB >>> >>> On Jan 17, 2017, 08:26, at 08:26, Etienne Chauchot >>> <echauc...@gmail.com> wrote: >>> >>>> Hi all, >>>> >>>> I have started to work on this ticket >>>> https://issues.apache.org/jira/browse/BEAM-135 >>>> >>>> As there where no vote since March 18th, is the issue still >>>> relevant/needed? >>>> >>>> Regards, >>>> >>>> Etienne >>>> >>> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >