[ 
https://issues.apache.org/jira/browse/KAFKA-7149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539596#comment-16539596
 ] 

Ashish Surana edited comment on KAFKA-7149 at 7/11/18 6:33 AM:
---------------------------------------------------------------

Made the change here: 
https://github.com/a-surana/kafka/commit/577992015d3bfc5a23e23b5bf32e40a3f92bc74a

Scenario#1: This is straightforward, and works with this.

Encoded version: 4

Decoder support latest version: 4

 

Scenario#2:

Encoded version: <=3 (encoded stream as non-gzip)

Decoder support latest version: 4 (decoding as gzip stream)

 

Scenario#3: This is difficult as decoder gets to know the encoded version from 
the first few bytes of the stream. Which might be zipped or non-zipped, and no 
reliable way to infer that.

Encoded version: 4 (encoded stream is gzip stream)

Decoder latest support version: 3 (decoding as non-gzip stream)

 

The change is not backward compatible(Scenario#2 & #3), but depicts the idea 
for this improvement.


was (Author: asurana):
This change is not backward compatible, but depicts the idea for this 
improvement.

> Reduce assignment data size to improve kafka streams scalability
> ----------------------------------------------------------------
>
>                 Key: KAFKA-7149
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7149
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Ashish Surana
>            Assignee: Ashish Surana
>            Priority: Major
>
> We observed that when we have high number of partitions, instances or 
> stream-threads, assignment-data size grows too fast and we start getting 
> below exception at kafka-broker.
> RecordTooLargeException
> Resolution of this issue is explained at: 
> https://issues.apache.org/jira/browse/KAFKA-6976
> Still it limits the scalability of kafka streams as moving around 100MBs of 
> assignment data for each rebalancing affects performance & reliability 
> (timeout exceptions starts appearing) as well. Also this limits kafka streams 
> scale even with high max.message.bytes setting as data size increases pretty 
> quickly with number of partitions, instances or stream-threads.
>  
> Solution:
> To address this issue in our cluster, we are sending the compressed 
> assignment-data. We saw assignment-data size reduced by 8X-10X. This improved 
> the kafka streams scalability drastically for us and we could now run it with 
> more than 8,000 partitions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to