[ https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904106#comment-16904106 ]
Vinoth Chandar commented on KAFKA-7149: --------------------------------------- I changed the approach from the original PR. Noticed few corner cases for TaskID . -> TopicPartition translations. Specifically, during {{partitionsForTask <- DefaultPartitionGrouper::partitionGroups()}}. for each topicGroup, it creates max(numPartitions of all source topics) tasks. e,g if topic t1 (p0,p1) t2 (p0, p1, p2) is the topic group A, then there is three tasks and task A_2 will only cater to t2_p2 and have no topic partitions for t1. Thus we cannot simply use the TaskId::partition as the topic partition. Spent sometime to see if we can derive this dynamically inside {{onAssignment()}}. But we cannot then handle the case where the leader has already seen a partition added to one of the topics and computed the assignment based off that > Reduce assignment data size to improve kafka streams scalability > ---------------------------------------------------------------- > > Key: KAFKA-7149 > URL: https://issues.apache.org/jira/browse/KAFKA-7149 > Project: Kafka > Issue Type: Improvement > Components: streams > Affects Versions: 2.0.0 > Reporter: Ashish Surana > Assignee: Vinoth Chandar > Priority: Major > > We observed that when we have high number of partitions, instances or > stream-threads, assignment-data size grows too fast and we start getting > below RecordTooLargeException at kafka-broker. > Workaround of this issue is commented at: > https://issues.apache.org/jira/browse/KAFKA-6976 > Still it limits the scalability of kafka streams as moving around 100MBs of > assignment data for each rebalancing affects performance & reliability > (timeout exceptions starts appearing) as well. Also this limits kafka streams > scale even with high max.message.bytes setting as data size increases pretty > quickly with number of partitions, instances or stream-threads. > > Solution: > To address this issue in our cluster, we are sending the compressed > assignment-data. We saw assignment-data size reduced by 8X-10X. This improved > the kafka streams scalability drastically for us and we could now run it with > more than 8,000 partitions. -- This message was sent by Atlassian JIRA (v7.6.14#76016)