So you mean the user should have a way of registering asynchronous activity with a callback (the callback must be registered with Beam, because Beam needs to know not to mark the element as done until all associated callbacks have completed). I think that's basically what Kenn was suggesting, unless I'm missing something.
On Fri, Mar 9, 2018 at 11:07 PM Romain Manni-Bucau <rmannibu...@gmail.com> wrote: > Yes, callback based. Beam today is synchronous and until bundles+combines > are reactive friendly, beam will be synchronous whatever other parts do. > Becoming reactive will enable to manage the threading issues properly and > to have better scalability on the overall execution when remote IO are > involved. > > However it requires to break source, sdf design to use completionstage - > or equivalent - to chain the processing properly and in an unified fashion. > > Le 9 mars 2018 23:48, "Reuven Lax" <re...@google.com> a écrit : > > If you're talking about reactive programming, at a certain level beam is > already reactive. Are you referring to a specific way of writing the code? > > > On Fri, Mar 9, 2018 at 1:59 PM Reuven Lax <re...@google.com> wrote: > >> What do you mean by reactive? >> >> On Fri, Mar 9, 2018, 6:58 PM Romain Manni-Bucau <rmannibu...@gmail.com> >> wrote: >> >>> @Kenn: why not preferring to make beam reactive? Would alow to scale way >>> more without having to hardly synchronize multithreading. Elegant and >>> efficient :). Beam 3? >>> >>> >>> Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : >>> >>>> I will start with the "exciting futuristic" answer, which is that we >>>> envision the new DoFn to be able to provide an automatic ExecutorService >>>> parameters that you can use as you wish. >>>> >>>> new DoFn<>() { >>>> @ProcessElement >>>> public void process(ProcessContext ctx, ExecutorService >>>> executorService) { >>>> ... launch some futures, put them in instance vars ... >>>> } >>>> >>>> @FinishBundle >>>> public void finish(...) { >>>> ... block on futures, output results if appropriate ... >>>> } >>>> } >>>> >>>> This way, the Java SDK harness can use its overarching knowledge of >>>> what is going on in a computation to, for example, share a thread pool >>>> between different bits. This was one reason to delete >>>> IntraBundleParallelization - it didn't allow the runner and user code to >>>> properly manage how many things were going on concurrently. And mostly the >>>> runner should own parallelizing to max out cores and what user code needs >>>> is asynchrony hooks that can interact with that. However, this feature is >>>> not thoroughly considered. TBD how much the harness itself manages blocking >>>> on outstanding requests versus it being your responsibility in >>>> FinishBundle, etc. >>>> >>>> I haven't explored rolling your own here, if you are willing to do the >>>> knob tuning to get the threading acceptable for your particular use case. >>>> Perhaps someone else can weigh in. >>>> >>>> Kenn >>>> >>>> On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge < >>>> josh.fe...@bounceexchange.com> wrote: >>>> >>>>> Hello all: >>>>> >>>>> Our team has a pipeline that make external network calls. These >>>>> pipelines are currently super slow, and the hypothesis is that they are >>>>> slow because we are not threading for our network calls. The github issue >>>>> below provides some discussion around this: >>>>> >>>>> https://github.com/apache/beam/pull/957 >>>>> >>>>> In beam 1.0, there was IntraBundleParallelization, which helped with >>>>> this. However, this was removed because it didn't comply with a few BEAM >>>>> paradigms. >>>>> >>>>> Questions going forward: >>>>> >>>>> What is advised for jobs that make blocking network calls? It seems >>>>> bundling the elements into groups of size X prior to passing to the DoFn, >>>>> and managing the threading within the function might work. thoughts? >>>>> Are these types of jobs even suitable for beam? >>>>> Are there any plans to develop features that help with this? >>>>> >>>>> Thanks >>>>> >>>> >