Hello all:

Our team has a pipeline that make external network calls. These pipelines
are currently super slow, and the hypothesis is that they are slow because
we are not threading for our network calls. The github issue below provides
some discussion around this:

https://github.com/apache/beam/pull/957

In beam 1.0, there was IntraBundleParallelization, which helped with this.
However, this was removed because it didn't comply with a few BEAM
paradigms.

Questions going forward:

What is advised for jobs that make blocking network calls? It seems
bundling the elements into groups of size X prior to passing to the DoFn,
and managing the threading within the function might work. thoughts?
Are these types of jobs even suitable for beam?
Are there any plans to develop features that help with this?

Thanks

Reply via email to