[ https://issues.apache.org/jira/browse/BEAM-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921696#comment-16921696 ]
TJ commented on BEAM-8121: -------------------------- Guys, thanks for such detailed comments! I've actually tried Kafka to BigQuery pipeline without any intermediate steps. It had the same throughput as with empty pipeline steps. So I think thats not the BQ sink issue. >From today I went for 2 weeks holiday, wasn't able to bring my laptop with me, >but my colleagues will try to provide the all info what you needed. > Messages are not distributed per machines when consuming from Kafka topic > with 1 partition > ------------------------------------------------------------------------------------------ > > Key: BEAM-8121 > URL: https://issues.apache.org/jira/browse/BEAM-8121 > Project: Beam > Issue Type: Bug > Components: io-java-kafka > Affects Versions: 2.14.0 > Reporter: TJ > Priority: Major > Attachments: datalake-dataflow-cleaned.zip > > > Messages are consumed from Kafka using KafkaIO. Each kafka topic contains > only 1 partition. (That means that messages can be consumed only by one > Consumer per 1 consumer group) > When backlog of topic grows and system scales from 1 to X machines, all the > messages seems to be executed onĀ the same machine on which they are read. > Due to that message throughput doesn't increase comparing X machines to 1 > machine. If one machine was reading 2K messagesĀ per s, X machines will be > reading the same amount. -- This message was sent by Atlassian Jira (v8.3.2#803003)