Hi, As stated in the title, I tried to implement a SDF for reading the Parquet file and I am trying to run it with Dataflow runner. As the initial split outputs a bunch of ranges but the number of workers are not scaled up and the work is not distributed. Any suggestion on what can be the problem? I have tested it with Direct runner and the parallelism looks fine on small samples on Direct Runner. Below is my implementation of the SDF https://github.com/apache/beam/pull/12223 --
*Jiadai Xia* SWE Intern 1 (646) 413 8071 [email protected] <https://www.linkedin.com/company/google/> <https://www.youtube.com/user/lifeatgoogle> <https://www.facebook.com/lifeatgoogle/> <https://twitter.com/lifeatgoogle> <https://www.instagram.com/lifeatgoogle>
