[ https://issues.apache.org/jira/browse/KAFKA-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401433#comment-17401433 ]
A. Sophie Blee-Goldman commented on KAFKA-13217: ------------------------------------------------ This is all the more important given the recent increase in default session.timeout to 45s, since that's a rather long time to go without noticing that a consumer has indeed permanently left the group > Reconsider skipping the LeaveGroup on close() or add an overload that does so > ----------------------------------------------------------------------------- > > Key: KAFKA-13217 > URL: https://issues.apache.org/jira/browse/KAFKA-13217 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: A. Sophie Blee-Goldman > Priority: Major > > In Kafka Streams, when an instance is shut down via the close() API, we > intentionally skip sending a LeaveGroup request. This is because often the > shutdown is not due to a scaling down event but instead some transient > closure, such as during a rolling bounce. In cases where the instance is > expected to start up again shortly after, we originally wanted to avoid that > member's tasks from being redistributed across the remaining group members > since this would disturb the stable assignment and could cause unnecessary > state migration and restoration. We also hoped > to limit the disruption to just a single rebalance, rather than forcing the > group to rebalance once when the member shuts down and then again when it > comes back up. So it's really an optimization for the case in which the > shutdown is temporary. > > That said, many of those optimizations are no longer necessary or at least > much less useful given recent features and improvements. For example > rebalances are now lightweight so skipping the 2nd rebalance is not as worth > optimizing for, and the new assignor will take into account the actual > underlying state for each task/partition assignment, rather than just the > previous assignment, so the assignment should be considerably more stable > across bounces and rolling restarts. > > Given that, it might be time to reconsider this optimization. Alternatively, > we could introduce another form of the close() API that forces the member to > leave the group, to be used in event of actual scale down rather than a > transient bounce. -- This message was sent by Atlassian Jira (v8.3.4#803005)