[ 
https://issues.apache.org/jira/browse/KAFKA-4435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15691169#comment-15691169
 ] 

Jason Gustafson commented on KAFKA-4435:
----------------------------------------

cc [~onurkaraman]

> Improve storage overhead of group metadata
> ------------------------------------------
>
>                 Key: KAFKA-4435
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4435
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: Jason Gustafson
>
> The GroupMetadataManager serializes the full subscriptions and assignments of 
> all consumer group members for each generation as a single message. This is a 
> problem for large consumer groups with a large number of topics since each 
> member's subscription/assignment is serialized separately. So if you have n 
> consumers each subscribing to the same m topics, then the serialized message 
> will contain m*n subscribed topics. At a certain size, you end up exceeding 
> the max message size.
> Some ideas for getting around this have been 1) turning on compression and 2) 
> adding regex support to the protocol. Both of these help, but maybe we should 
> question whether the subscriptions/assignments need to be written at all. The 
> reason to include this information in the log is basically it prevent a 
> rebalance on coordinator failover. After failover, the new coordinator can 
> consume the log and determine the full state of every group. The consumers in 
> the group simply send heartbeats to the new coordinator, once it is found.
> In fact, preventing the rebalance is not really the main issue: it's ensuring 
> that the last generation can commit its offsets. If nothing were written to 
> the log, then the group would be recreated after failover from scratch and 
> existing members would not be able to commit offsets (since their generation 
> would no longer be valid). But the subscription/assignment is opaque to the 
> coordinator and is not actually used when committing offsets. All it really 
> need is the generation and the list of memberIds. 
> Supposing then that we removed the subscriptions/assignments from the group, 
> but retained the generation/memberId information, one loose end is servicing 
> the DescribeGroup request. After failover, we would no longer have the 
> subscription/assignment information we need to answer that request. One 
> option would be to trigger a rebalance after failover in order to repopulate 
> it. The previous generation would still be able to commit offsets before 
> rejoining the group. Potentially we could even delay this rebalance until we 
> actually receive a DescribeGroups request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to