Are you using Dataflow runner v2[1]? If so, then you can use: --number_of_worker_harness_threads=X
Do you know where/why the OOM is occurring? 1: https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 2: https://github.com/apache/beam/blob/017936f637b119f0b0c0279a226c9f92a2cf4f15/sdks/python/apache_beam/options/pipeline_options.py#L834 On Thu, Aug 20, 2020 at 7:33 AM Kamil Wasilewski < [email protected]> wrote: > Hi all, > > As I stated in the title, is there an equivalent for > --numberOfWorkerHarnessThreads in Python SDK? I've got a streaming pipeline > in Python which suffers from OutOfMemory exceptions (I'm using Dataflow). > Switching to highmem workers solved the issue, but I wonder if I can set a > limit of threads that will be used in a single worker to decrease memory > usage. > > Regards, > Kamil > >
