[jira] [Commented] (BEAM-68) Support for limiting parallelism of a step

Xu Mingmin (JIRA) Mon, 06 Mar 2017 13:32:53 -0800

    [ 
https://issues.apache.org/jira/browse/BEAM-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898151#comment-15898151
 ]


Xu Mingmin commented on BEAM-68:
--------------------------------

This's required by some runners. With this parameter, runners, like Flink/Storm 
can leverage it, and those, like Dataflow can ignore it.
I'm not sure about the existing implementation of Flink runner, seems like set 
in job level, meaning same parallelism for each step.

FYI Flink parallel 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/parallel.html 
Storm parallel 
http://storm.apache.org/releases/1.0.1/Understanding-the-parallelism-of-a-Storm-topology.html

> Support for limiting parallelism of a step
> ------------------------------------------
>
>                 Key: BEAM-68
>                 URL: https://issues.apache.org/jira/browse/BEAM-68
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Daniel Halperin
>
> Users may want to limit the parallelism of a step. Two classic uses cases are:
> - User wants to produce at most k files, so sets 
> TextIO.Write.withNumShards(k).
> - External API only supports k QPS, so user sets a limit of k/(expected 
> QPS/step) on the ParDo that makes the API call.
> Unfortunately, there is no way to do this effectively within the Beam model. 
> A GroupByKey with exactly k keys will guarantee that only k elements are 
> produced, but runners are free to break fusion in ways that each element may 
> be processed in parallel later.
> To implement this functionaltiy, I believe we need to add this support to the 
> Beam Model.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (BEAM-68) Support for limiting parallelism of a step

Reply via email to