Hi,
As stated in the title, I tried to implement a SDF for reading the Parquet
file and I am trying to run it with Dataflow runner. As the initial split
outputs a bunch of ranges but the number of workers are not scaled up and
the work is not distributed. Any suggestion on what can be the problem?
I have tested it with Direct runner and the parallelism looks fine on small
samples on Direct Runner.
Below is my implementation of the SDF
https://github.com/apache/beam/pull/12223
-- 





*Jiadai Xia*

SWE Intern

1 (646) 413 8071

[email protected]

<https://www.linkedin.com/company/google/>
<https://www.youtube.com/user/lifeatgoogle>
<https://www.facebook.com/lifeatgoogle/> <https://twitter.com/lifeatgoogle>

<https://www.instagram.com/lifeatgoogle>

Reply via email to