@Kenn: why not preferring to make beam reactive? Would alow to scale way more without having to hardly synchronize multithreading. Elegant and efficient :). Beam 3?
Le 9 mars 2018 22:49, "Kenneth Knowles" <k...@google.com> a écrit : > I will start with the "exciting futuristic" answer, which is that we > envision the new DoFn to be able to provide an automatic ExecutorService > parameters that you can use as you wish. > > new DoFn<>() { > @ProcessElement > public void process(ProcessContext ctx, ExecutorService > executorService) { > ... launch some futures, put them in instance vars ... > } > > @FinishBundle > public void finish(...) { > ... block on futures, output results if appropriate ... > } > } > > This way, the Java SDK harness can use its overarching knowledge of what > is going on in a computation to, for example, share a thread pool between > different bits. This was one reason to delete IntraBundleParallelization - > it didn't allow the runner and user code to properly manage how many things > were going on concurrently. And mostly the runner should own parallelizing > to max out cores and what user code needs is asynchrony hooks that can > interact with that. However, this feature is not thoroughly considered. TBD > how much the harness itself manages blocking on outstanding requests versus > it being your responsibility in FinishBundle, etc. > > I haven't explored rolling your own here, if you are willing to do the > knob tuning to get the threading acceptable for your particular use case. > Perhaps someone else can weigh in. > > Kenn > > On Fri, Mar 9, 2018 at 1:38 PM Josh Ferge <josh.fe...@bounceexchange.com> > wrote: > >> Hello all: >> >> Our team has a pipeline that make external network calls. These pipelines >> are currently super slow, and the hypothesis is that they are slow because >> we are not threading for our network calls. The github issue below provides >> some discussion around this: >> >> https://github.com/apache/beam/pull/957 >> >> In beam 1.0, there was IntraBundleParallelization, which helped with >> this. However, this was removed because it didn't comply with a few BEAM >> paradigms. >> >> Questions going forward: >> >> What is advised for jobs that make blocking network calls? It seems >> bundling the elements into groups of size X prior to passing to the DoFn, >> and managing the threading within the function might work. thoughts? >> Are these types of jobs even suitable for beam? >> Are there any plans to develop features that help with this? >> >> Thanks >> >