Are you using Dataflow runner v2[1] since the default for Beam Java still
uses Dataflow runner v1?
Dataflow runner v2 is the only one that supports autoscaling and dynamic
splitting of splittable dofns in bounded pipelines.

1:
https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2

On Fri, Aug 21, 2020 at 10:54 AM Jiadai Xia <[email protected]> wrote:

> Hi,
> As stated in the title, I tried to implement a SDF for reading the Parquet
> file and I am trying to run it with Dataflow runner. As the initial split
> outputs a bunch of ranges but the number of workers are not scaled up and
> the work is not distributed. Any suggestion on what can be the problem?
> I have tested it with Direct runner and the parallelism looks fine on
> small samples on Direct Runner.
> Below is my implementation of the SDF
> https://github.com/apache/beam/pull/12223
> --
>
>
>
>
>
> *Jiadai Xia*
>
> SWE Intern
>
> 1 (646) 413 8071 <(646)%20413-8071>
>
> [email protected]
>
> <https://www.linkedin.com/company/google/>
> <https://www.youtube.com/user/lifeatgoogle>
> <https://www.facebook.com/lifeatgoogle/>
> <https://twitter.com/lifeatgoogle>
>
> <https://www.instagram.com/lifeatgoogle>
>
>
>

Reply via email to