[ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16615271#comment-16615271
 ] 

Navinder Brar commented on KAFKA-7149:
--------------------------------------

Hi [~guozhang] What I mean is currently the Assignment which is shared to Group 
Coordinator looks like this:
{code:java}
[{consumer1: {activePartitions1, assignmentInfo1}}, {consumer2: 
{activePartitions2, assignmentInfo2}}, ........ ]{code}
where
{code:java}
AssignmentInfo=
{List<TaskId> activeTasks, Map<TaskId, Set<TopicPartition>> standbyTasks, 
Map<HostInfo, Set<TopicPartition>> partitionsByHost}
 
{code}
Now in the first version, I am changing this AssignmentInfo to:

*V1:*

 
{code:java}
AssignmentInfo=
{List<TaskId> activeTasks, Map<TaskId, Set<TopicPartition>> standbyTasks, 
Map<HostInfo, Set<TaskId>> tasksByHost}
{code}
 

 

But, my point is if there are 500 consumers, the tasksByHost map will be same 
for all, which will contain global Assignment. But we are unnecessarily sending 
this same map inside the Assignment array for all the consumers. Instead, we 
can some an object like something below which is shared with GroupCoordinator.

*V2:* 
{code:java}
Assignment= {Map<HostInfo, Set<TaskId>> tasksByHost, [{consumer1: 
{activePartitions1, assignmentInfo1}}, {consumer2: {activePartitions2, 
assignmentInfo2}}, ........ ]}{code}
where
{code:java}
AssignmentInfo= {List<TaskId> activeTasks, Map<TaskId, Set<TopicPartition>> 
standbyTasks}{code}

> Reduce assignment data size to improve kafka streams scalability
> ----------------------------------------------------------------
>
>                 Key: KAFKA-7149
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7149
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Ashish Surana
>            Assignee: Ashish Surana
>            Priority: Major
>
> We observed that when we have high number of partitions, instances or 
> stream-threads, assignment-data size grows too fast and we start getting 
> below RecordTooLargeException at kafka-broker.
> Workaround of this issue is commented at: 
> https://issues.apache.org/jira/browse/KAFKA-6976
> Still it limits the scalability of kafka streams as moving around 100MBs of 
> assignment data for each rebalancing affects performance & reliability 
> (timeout exceptions starts appearing) as well. Also this limits kafka streams 
> scale even with high max.message.bytes setting as data size increases pretty 
> quickly with number of partitions, instances or stream-threads.
>  
> Solution:
> To address this issue in our cluster, we are sending the compressed 
> assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
> the kafka streams scalability drastically for us and we could now run it with 
> more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to