[ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904245#comment-16904245
 ] 

Vinoth Chandar edited comment on KAFKA-7149 at 8/9/19 11:30 PM:
----------------------------------------------------------------

Pasting some size tests for old and new assignment information : 
{code:java}
// Assumptions : Streams topology with 10 input topics, 4 sub topologies (2 
topics per sub topology) = ~20 topics
// High number of hosts = 500; High number of partitions = 128
//topicPrefix = "streams_topic_name"; <- gains are very sensitive to topic name 
length ofc;
oldAssignmentInfoBytes : 77684 , newAssignmentInfoBytes: 42698{code}
Roughly ~45% savings.. For 500 hosts, we will be reducing from ~39MB -> ~21MB

its prett s

(NOTE: this is still a single object only, we do need the protocol 
change/compression on internal topics to ultimately fix the large message 
problem) 


was (Author: vc):
Pasting some size tests for old and new assignment information : 
{code:java}
// Assumptions : Streams topology with 10 input topics, 4 sub topologies (2 
topics per sub topology) = ~20 topics
// High number of hosts = 500; High number of partitions = 128
oldAssignmentInfoBytes : 77684 , newAssignmentInfoBytes: 42698{code}
Roughly ~45% savings.. For 500 hosts, we will be reducing from ~39MB -> ~21MB

(NOTE: this is still a single object only, we do need the protocol 
change/compression on internal topics to ultimately fix the large message 
problem) 

> Reduce assignment data size to improve kafka streams scalability
> ----------------------------------------------------------------
>
>                 Key: KAFKA-7149
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7149
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>    Affects Versions: 2.0.0
>            Reporter: Ashish Surana
>            Assignee: Vinoth Chandar
>            Priority: Major
>
> We observed that when we have high number of partitions, instances or 
> stream-threads, assignment-data size grows too fast and we start getting 
> below RecordTooLargeException at kafka-broker.
> Workaround of this issue is commented at: 
> https://issues.apache.org/jira/browse/KAFKA-6976
> Still it limits the scalability of kafka streams as moving around 100MBs of 
> assignment data for each rebalancing affects performance & reliability 
> (timeout exceptions starts appearing) as well. Also this limits kafka streams 
> scale even with high max.message.bytes setting as data size increases pretty 
> quickly with number of partitions, instances or stream-threads.
>  
> Solution:
> To address this issue in our cluster, we are sending the compressed 
> assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
> the kafka streams scalability drastically for us and we could now run it with 
> more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to