I could imagine for example, a 'parallelismHint' field in the base
parameters that could be set to maxNumWorkers when running on dataflow or
an equivalent parameter when running on flink. It would be useful to get a
default value for the sharding in the Reshuffle changes here
https://github.com/apache/beam/pull/11919, but more generally to have some
decent guess on how to best shard work. Then it would be runner-agnostic;
you could set it to something like numCpus on the local runner for instance.

On Sat, Jun 27, 2020 at 2:04 AM Reuven Lax <[email protected]> wrote:

> It's an interesting question - this parameter is clearly very runner
> specific (e.g. it would be meaningless for the Dataflow runner, where
> parallelism is not a static constant). How should we go about passing
> runner-specific options per transform?
>
> On Fri, Jun 26, 2020 at 1:14 PM Akshay Iyangar <[email protected]>
> wrote:
>
>> Hi beam community,
>>
>>
>>
>> So I had brought this issue in our slack channel but I guess this
>> warrants a deeper discussion and if we do go about what is the POA for it.
>>
>>
>>
>> So basically currently for Flink Runner we don’t support operator level
>> parallelism which native Flink provides OOTB. So I was wondering what the
>> community feels about having some way to pass parallelism for individual
>> operators esp.  for some of the existing IO’s
>>
>>
>>
>> Wanted to know what people think of this.
>>
>>
>>
>> Thanks
>>
>> Akshay I
>>
>

Reply via email to