[ https://issues.apache.org/jira/browse/BEAM-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655020#comment-15655020 ]
Raghu Angadi commented on BEAM-958: ----------------------------------- A change to this policy can break Dataflow job update depending the source as update requires number of sources to remain same across an update. Native pubsub source is not affected. > desiredNumWorkers in Dataflow is too low > ---------------------------------------- > > Key: BEAM-958 > URL: https://issues.apache.org/jira/browse/BEAM-958 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow > Affects Versions: 0.3.0-incubating > Reporter: Raghu Angadi > Assignee: Davor Bonaci > Labels: breaking_change > > {{desiredNumWorkers}} in [UnboundedSource > API|https://github.com/apache/incubator-beam/blob/v0.3.0-incubating-RC1/sdks/java/core/src/main/java/org/apache/beam/sdk/io/UnboundedSource.java#L69] > is a suggestion to a source about how many splits it should create. KafkaIO > currently takes this literally and only creates up to this many splits. > The main draw back is that it is very low in Dataflow. It is calculated as > * {{1 * maxNumWorkers}} if {{--maxNumWorkers}} is specified, otherwise > * {{3 * numWorkers}}. > That implies there is only single reader per worker (which is usually a 4 > core VM). That can leave CPU under utilized on many pipelines. > Even 3x in case of fixes number of workers seems low to me. -- This message was sent by Atlassian JIRA (v6.3.4#6332)