[ https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539596#comment-16539596 ]
Ashish Surana edited comment on KAFKA-7149 at 7/11/18 6:33 AM: --------------------------------------------------------------- Made the change here: https://github.com/a-surana/kafka/commit/577992015d3bfc5a23e23b5bf32e40a3f92bc74a Scenario#1: This is straightforward, and works with this. Encoded version: 4 Decoder support latest version: 4 Scenario#2: Encoded version: <=3 (encoded stream as non-gzip) Decoder support latest version: 4 (decoding as gzip stream) Scenario#3: This is difficult as decoder gets to know the encoded version from the first few bytes of the stream. Which might be zipped or non-zipped, and no reliable way to infer that. Encoded version: 4 (encoded stream is gzip stream) Decoder latest support version: 3 (decoding as non-gzip stream) The change is not backward compatible(Scenario#2 & #3), but depicts the idea for this improvement. was (Author: asurana): This change is not backward compatible, but depicts the idea for this improvement. > Reduce assignment data size to improve kafka streams scalability > ---------------------------------------------------------------- > > Key: KAFKA-7149 > URL: https://issues.apache.org/jira/browse/KAFKA-7149 > Project: Kafka > Issue Type: Improvement > Reporter: Ashish Surana > Assignee: Ashish Surana > Priority: Major > > We observed that when we have high number of partitions, instances or > stream-threads, assignment-data size grows too fast and we start getting > below exception at kafka-broker. > RecordTooLargeException > Resolution of this issue is explained at: > https://issues.apache.org/jira/browse/KAFKA-6976 > Still it limits the scalability of kafka streams as moving around 100MBs of > assignment data for each rebalancing affects performance & reliability > (timeout exceptions starts appearing) as well. Also this limits kafka streams > scale even with high max.message.bytes setting as data size increases pretty > quickly with number of partitions, instances or stream-threads. > > Solution: > To address this issue in our cluster, we are sending the compressed > assignment-data. We saw assignment-data size reduced by 8X-10X. This improved > the kafka streams scalability drastically for us and we could now run it with > more than 8,000 partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)