Are you using Dataflow runner v2[1] since the default for Beam Java still uses Dataflow runner v1? Dataflow runner v2 is the only one that supports autoscaling and dynamic splitting of splittable dofns in bounded pipelines.
1: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 On Fri, Aug 21, 2020 at 10:54 AM Jiadai Xia <[email protected]> wrote: > Hi, > As stated in the title, I tried to implement a SDF for reading the Parquet > file and I am trying to run it with Dataflow runner. As the initial split > outputs a bunch of ranges but the number of workers are not scaled up and > the work is not distributed. Any suggestion on what can be the problem? > I have tested it with Direct runner and the parallelism looks fine on > small samples on Direct Runner. > Below is my implementation of the SDF > https://github.com/apache/beam/pull/12223 > -- > > > > > > *Jiadai Xia* > > SWE Intern > > 1 (646) 413 8071 <(646)%20413-8071> > > [email protected] > > <https://www.linkedin.com/company/google/> > <https://www.youtube.com/user/lifeatgoogle> > <https://www.facebook.com/lifeatgoogle/> > <https://twitter.com/lifeatgoogle> > > <https://www.instagram.com/lifeatgoogle> > > >
