Re: Individual Parallelism support for Flink Runner

Luke Cwik Mon, 29 Jun 2020 08:59:00 -0700

Check out this thread[1] about adding "runner determined sharding" as a
general concept. This could be used to enhance the reshuffle implementation
significantly and might remove the need for per transform parallelism from
that specific use case and likely from most others.


1:
https://lists.apache.org/thread.html/rfd1ca93268eb215fbbcfe098c1dfb330f1b84fb89673325135dfd9a8%40%3Cdev.beam.apache.org%3E

On Mon, Jun 29, 2020 at 4:03 AM Maximilian Michels <[email protected]> wrote:

> We could allow parameterizing transforms by using transform identifiers
> from the pipeline, e.g.
>
>
>    options = ['--parameterize=MyTransform;parallelism=5']
>    with Pipeline.create(PipelineOptions(options)) as p:
>      p | Create(1, 2, 3) | 'MyTransform' >> ParDo(..)
>
>
> Those hints should always be optional, such that a pipeline continues to
> run on all runners.
>
> -Max
>
> On 28.06.20 14:30, Reuven Lax wrote:
> > However such a parameter would be specific to a single transform,
> > whereas maxNumWorkers is a global parameter today.
> >
> > On Sat, Jun 27, 2020 at 10:31 PM Daniel Collins <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     I could imagine for example, a 'parallelismHint' field in the base
> >     parameters that could be set to maxNumWorkers when running on
> >     dataflow or an equivalent parameter when running on flink. It would
> >     be useful to get a default value for the sharding in the Reshuffle
> >     changes here https://github.com/apache/beam/pull/11919, but more
> >     generally to have some decent guess on how to best shard work. Then
> >     it would be runner-agnostic; you could set it to something like
> >     numCpus on the local runner for instance.
> >
> >     On Sat, Jun 27, 2020 at 2:04 AM Reuven Lax <[email protected]
> >     <mailto:[email protected]>> wrote:
> >
> >         It's an interesting question - this parameter is clearly very
> >         runner specific (e.g. it would be meaningless for the Dataflow
> >         runner, where parallelism is not a static constant). How should
> >         we go about passing runner-specific options per transform?
> >
> >         On Fri, Jun 26, 2020 at 1:14 PM Akshay Iyangar
> >         <[email protected] <mailto:[email protected]>> wrote:
> >
> >             Hi beam community,____
> >
> >             __ __
> >
> >             So I had brought this issue in our slack channel but I guess
> >             this warrants a deeper discussion and if we do go about what
> >             is the POA for it.____
> >
> >             __ __
> >
> >             So basically currently for Flink Runner we don’t support
> >             operator level parallelism which native Flink provides OOTB.
> >             So I was wondering what the community feels about having
> >             some way to pass parallelism for individual operators esp.
> >               for some of the existing IO’s ____
> >
> >             __ __
> >
> >             Wanted to know what people think of this.____
> >
> >             __ __
> >
> >             Thanks ____
> >
> >             Akshay I____
> >
>

Re: Individual Parallelism support for Flink Runner

Reply via email to