Hi Dan,

very very interesting ! Thanks for sharing.

Regards
JB

On 05/19/2016 07:09 AM, Dan Halperin wrote:
Hey folks,

This morning, my colleagues Eugene & Malo posted *No shard left behind:
dynamic work rebalancing in Google Cloud Dataflow
<https://cloud.google.com/blog/big-data/2016/05/no-shard-left-behind-dynamic-work-rebalancing-in-google-cloud-dataflow>*.
This article discusses Cloud Dataflow’s solution to the well-known
straggler problem.

In a large batch processing job with many tasks executing in parallel, some
of the tasks – the stragglers – can take a much longer time to complete
than others, perhaps due to imperfect splitting of the work into parallel
chunks when issuing the job. Typically, waiting for stragglers means that
the overall job completes later than it should, and may also reserve too
many machines that may be underutilized at the end. Cloud Dataflow’s
dynamic work rebalancing can mitigate stragglers in most cases.

What I’d like to highlight for the Apache Beam (incubating) community is
that Cloud Dataflow’s dynamic work rebalancing is implemented using
*runner-specific* control logic on top of Beam’s *runner-independent*
BoundedSource
API
<https://github.com/apache/incubator-beam/blob/9fa97fb2491bc784df53fb0f044409dbbc2af3d7/sdks/java/core/src/main/java/org/apache/beam/sdk/io/BoundedSource.java>.
Specifically, to steal work from a straggler, a runner need only call the
reader’s splitAtFraction method. This will generate a new source containing
leftover work, and then the runner can pass that source off to another idle
worker. As Beam matures, I hope that other runners are interested in
figuring out whether these APIs can help them improve performance,
implementing dynamic work rebalancing, and collaborating on API changes
that will help solve other pain points.

Dan

(Also posted on Beam blog:
http://beam.incubator.apache.org/blog/2016/05/18/splitAtFraction-method.html
)


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Reply via email to