I could imagine for example, a 'parallelismHint' field in the base parameters that could be set to maxNumWorkers when running on dataflow or an equivalent parameter when running on flink. It would be useful to get a default value for the sharding in the Reshuffle changes here https://github.com/apache/beam/pull/11919, but more generally to have some decent guess on how to best shard work. Then it would be runner-agnostic; you could set it to something like numCpus on the local runner for instance.
On Sat, Jun 27, 2020 at 2:04 AM Reuven Lax <[email protected]> wrote: > It's an interesting question - this parameter is clearly very runner > specific (e.g. it would be meaningless for the Dataflow runner, where > parallelism is not a static constant). How should we go about passing > runner-specific options per transform? > > On Fri, Jun 26, 2020 at 1:14 PM Akshay Iyangar <[email protected]> > wrote: > >> Hi beam community, >> >> >> >> So I had brought this issue in our slack channel but I guess this >> warrants a deeper discussion and if we do go about what is the POA for it. >> >> >> >> So basically currently for Flink Runner we don’t support operator level >> parallelism which native Flink provides OOTB. So I was wondering what the >> community feels about having some way to pass parallelism for individual >> operators esp. for some of the existing IO’s >> >> >> >> Wanted to know what people think of this. >> >> >> >> Thanks >> >> Akshay I >> >
