2018-03-10 17:30 GMT+01:00 Reuven Lax <re...@google.com>: > Have you considered drafting in detail what you think this API might look > like? >
Yes, but it is after the "enhancements" - for my use cases - and "bugs" list so didn't started to work on it much. > > If it's a radically different API, it might be more appropriate as an > alternative parallel Beam API rather than a replacement for the current API > (there is also one such fluent API in the works). > What I plan is to draft it on top of beam (so the "useless" case I spoke about before) and then propose to impl it ~natively and move it as main API for another major. > > > On Sat, Mar 10, 2018 at 7:23 AM Romain Manni-Bucau <rmannibu...@gmail.com> > wrote: > >> >> >> 2018-03-10 16:19 GMT+01:00 Reuven Lax <re...@google.com>: >> >>> This is another version (maybe a better, Java 8 idiomatic one?) of what >>> Kenn suggested. >>> >>> Note that with NewDoFn this need not be incompatible (so might not >>> require waiting till Beam 3.0). We can recognize new parameters to >>> processElement and populate add needed. >>> >> >> This is right however in my head it was a single way movemenent to >> enforce the design to be reactive and not fake a reactive API with a sync >> and not reactive impl which is what would be done today with both support I >> fear. >> >> >>> >>> On Sat, Mar 10, 2018, 12:13 PM Romain Manni-Bucau <rmannibu...@gmail.com> >>> wrote: >>> >>>> Yes, for the dofn for instance, instead of having >>>> processcontext.element()=<T> you get a CompletionStage<T> and output gets >>>> it as well. >>>> >>>> This way you register an execution chain. Mixed with streams you get a >>>> big data java 8/9/10 API which enabkes any connectivity in a wel performing >>>> manner ;). >>>> >>>> Le 10 mars 2018 13:56, "Reuven Lax" <re...@google.com> a écrit : >>>> >>>>> So you mean the user should have a way of registering asynchronous >>>>> activity with a callback (the callback must be registered with Beam, >>>>> because Beam needs to know not to mark the element as done until all >>>>> associated callbacks have completed). I think that's basically what Kenn >>>>> was suggesting, unless I'm missing something. >>>>> >>>>> >>>>> On Fri, Mar 9, 2018 at 11:07 PM Romain Manni-Bucau < >>>>> rmannibu...@gmail.com> wrote: >>>>> >>>>>> Yes, callback based. Beam today is synchronous and until >>>>>> bundles+combines are reactive friendly, beam will be synchronous whatever >>>>>> other parts do. Becoming reactive will enable to manage the threading >>>>>> issues properly and to have better scalability on the overall execution >>>>>> when remote IO are involved. >>>>>> >>>>>> However it requires to break source, sdf design to use >>>>>> completionstage - or equivalent - to chain the processing properly and in >>>>>> an unified fashion. >>>>>> >>>>>> Le 9 mars 2018 23:48, "Reuven Lax" <re...@google.com> a écrit : >>>>>> >>>>>> If you're talking about reactive programming, at a certain level beam >>>>>> is already reactive. Are you referring to a specific way of writing the >>>>>> code? >>>>>> >>>>>> >>>>>> On Fri, Mar 9, 2018 at 1:59 PM Reuven Lax <re...@google.com> wrote: >>>>>> >>>>>>> What do you mean by reactive? >>>>>>> >>>>>>> On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau < >>>>>>> rmannibu...@gmail.com> wrote: >>>>>>> >>>>>>>> @Kenn: why not preferring to make beam reactive? Would alow to >>>>>>>> scale way more without having to hardly synchronize multithreading. >>>>>>>> Elegant >>>>>>>> and efficient :). Beam 3? >>>>>>>> >>>>>>>> >>>>>>>> Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : >>>>>>>> >>>>>>>>> I will start with the "exciting futuristic" answer, which is that >>>>>>>>> we envision the new DoFn to be able to provide an automatic >>>>>>>>> ExecutorService >>>>>>>>> parameters that you can use as you wish. >>>>>>>>> >>>>>>>>> new DoFn<>() { >>>>>>>>> @ProcessElement >>>>>>>>> public void process(ProcessContext ctx, ExecutorService >>>>>>>>> executorService) { >>>>>>>>> ... launch some futures, put them in instance vars ... >>>>>>>>> } >>>>>>>>> >>>>>>>>> @FinishBundle >>>>>>>>> public void finish(...) { >>>>>>>>> ... block on futures, output results if appropriate ... >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> This way, the Java SDK harness can use its overarching knowledge >>>>>>>>> of what is going on in a computation to, for example, share a thread >>>>>>>>> pool >>>>>>>>> between different bits. This was one reason to delete >>>>>>>>> IntraBundleParallelization - it didn't allow the runner and user code >>>>>>>>> to >>>>>>>>> properly manage how many things were going on concurrently. And >>>>>>>>> mostly the >>>>>>>>> runner should own parallelizing to max out cores and what user code >>>>>>>>> needs >>>>>>>>> is asynchrony hooks that can interact with that. However, this >>>>>>>>> feature is >>>>>>>>> not thoroughly considered. TBD how much the harness itself manages >>>>>>>>> blocking >>>>>>>>> on outstanding requests versus it being your responsibility in >>>>>>>>> FinishBundle, etc. >>>>>>>>> >>>>>>>>> I haven't explored rolling your own here, if you are willing to do >>>>>>>>> the knob tuning to get the threading acceptable for your particular >>>>>>>>> use >>>>>>>>> case. Perhaps someone else can weigh in. >>>>>>>>> >>>>>>>>> Kenn >>>>>>>>> >>>>>>>>> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge < >>>>>>>>> josh.fe...@bounceexchange.com> wrote: >>>>>>>>> >>>>>>>>>> Hello all: >>>>>>>>>> >>>>>>>>>> Our team has a pipeline that make external network calls. These >>>>>>>>>> pipelines are currently super slow, and the hypothesis is that they >>>>>>>>>> are >>>>>>>>>> slow because we are not threading for our network calls. The github >>>>>>>>>> issue >>>>>>>>>> below provides some discussion around this: >>>>>>>>>> >>>>>>>>>> https://github.com/apache/beam/pull/957 >>>>>>>>>> >>>>>>>>>> In beam 1.0, there was IntraBundleParallelization, which helped >>>>>>>>>> with this. However, this was removed because it didn't comply with a >>>>>>>>>> few >>>>>>>>>> BEAM paradigms. >>>>>>>>>> >>>>>>>>>> Questions going forward: >>>>>>>>>> >>>>>>>>>> What is advised for jobs that make blocking network calls? It >>>>>>>>>> seems bundling the elements into groups of size X prior to passing >>>>>>>>>> to the >>>>>>>>>> DoFn, and managing the threading within the function might work. >>>>>>>>>> thoughts? >>>>>>>>>> Are these types of jobs even suitable for beam? >>>>>>>>>> Are there any plans to develop features that help with this? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>> >>>>>> >>