[ https://issues.apache.org/jira/browse/SPARK-23541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-23541: ------------------------------------ Assignee: Tathagata Das (was: Apache Spark) > Allow Kafka source to read data with greater parallelism than the number of > topic-partitions > -------------------------------------------------------------------------------------------- > > Key: SPARK-23541 > URL: https://issues.apache.org/jira/browse/SPARK-23541 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming > Affects Versions: 2.3.0 > Reporter: Tathagata Das > Assignee: Tathagata Das > Priority: Major > > Currently, when the Kafka source reads from Kafka, it generates as many tasks > as the number of partitions in the topic(s) to be read. In some case, it may > be beneficial to read the data with greater parallelism, that is, with more > number partitions/tasks. That means, offset ranges must be divided up into > smaller ranges such the number of records in partition ~= total records in > batch / desired partitions. This would also balance out any data skews > between topic-partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org