[ https://issues.apache.org/jira/browse/BEAM-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavlo Pohrrebnyi closed BEAM-7403. ---------------------------------- Fix Version/s: 2.11.0 Resolution: Won't Fix Dataflow Runner was fixed by Google > BigQueryIO.Write does not autoscale correctly (idle workers) > ------------------------------------------------------------ > > Key: BEAM-7403 > URL: https://issues.apache.org/jira/browse/BEAM-7403 > Project: Beam > Issue Type: Bug > Components: io-java-gcp > Reporter: Pavlo Pohrrebnyi > Priority: Major > Fix For: 2.11.0 > > > Apache Beam version: > 2.10 > JAVA SDK > Dataflow GCP Staged > Details: > We have a streaming dataflow which ingests data into BigQuery (Streaming > Inserts). > We deploy a job with max number of workers = 40 and > there is a huge backlog already (high watermark). > When the dataflow starts it scales 0 -> 3 (from 0 to 3 workers) > and starts ingesting with 12000 messages/sec rate. > After 2 mins it scales 3 -> 40 to keep up with a backlog. > After scaling up, the rate never goes higher than it was with 3 nodes (12000 > messages/sec). > We have memory consumption metrics in Stackdriver; from them > we see that the first 3 workers consume about 5GB of RAM and the rest 37 > workers > consume about 0.2GB RAM. It appears that these autoscaled Nodes are idle? > Importantly, they don’t add to Streaming Inserts process for BigQuery. > Autoscaling in the other streaming pipelines we have works fine. > It appears that this is related to BigQuery streaming inserts. > -- This message was sent by Atlassian Jira (v8.3.4#803005)