+user <[email protected]> On Thu, Aug 20, 2020 at 9:47 AM Luke Cwik <[email protected]> wrote:
> Are you using Dataflow runner v2[1]? > > If so, then you can use: > --number_of_worker_harness_threads=X > > Do you know where/why the OOM is occurring? > > 1: > https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#dataflow-runner-v2 > 2: > https://github.com/apache/beam/blob/017936f637b119f0b0c0279a226c9f92a2cf4f15/sdks/python/apache_beam/options/pipeline_options.py#L834 > > On Thu, Aug 20, 2020 at 7:33 AM Kamil Wasilewski < > [email protected]> wrote: > >> Hi all, >> >> As I stated in the title, is there an equivalent for >> --numberOfWorkerHarnessThreads in Python SDK? I've got a streaming pipeline >> in Python which suffers from OutOfMemory exceptions (I'm using Dataflow). >> Switching to highmem workers solved the issue, but I wonder if I can set a >> limit of threads that will be used in a single worker to decrease memory >> usage. >> >> Regards, >> Kamil >> >>
