[ https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074192#comment-17074192 ]
ASF GitHub Bot commented on KAFKA-6145: --------------------------------------- ableegoldman commented on pull request #8409: KAFKA-6145: KIP-441 Pt. 6 Trigger probing rebalances until group is stable URL: https://github.com/apache/kafka/pull/8409 This KIP is fairly straightforward, and does as the title describes by enforcing a rebalance once the configured `probing.rebalance.interval` has elapsed. However, we have had to modify the original plan in the KIP slightly to handle an edge case with static membership enabled: Since the group leader can crash and restart without triggering a rebalance, we can't rely on a purely in-memory flag/counter to keep track of these probing rebalances. We can instead rely on the assignment, encoding the upcoming probing rebalance in the `AssignmentInfo`. This is encoded as the time in ms of the next scheduled rebalance, ie it is set to `currentTimeMs + probingRebalanceIntervalMs` when the assignment is being generated. This anchors the probing rebalances to wall clock time, and ensures a pathologically failing member will not prevent the group from ever rebalancing We leave it up to a single member to be responsible for triggering the probing rebalances, and encode this for a single consumer on the leader's client (chosen arbitrarily). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Warm up new KS instances before migrating tasks - potentially a two phase > rebalance > ----------------------------------------------------------------------------------- > > Key: KAFKA-6145 > URL: https://issues.apache.org/jira/browse/KAFKA-6145 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Antony Stubbs > Assignee: Sophie Blee-Goldman > Priority: Major > Labels: needs-kip > > Currently when expanding the KS cluster, the new node's partitions will be > unavailable during the rebalance, which for large states can take a very long > time, or for small state stores even more than a few ms can be a deal breaker > for micro service use cases. > One workaround would be two execute the rebalance in two phases: > 1) start running state store building on the new node > 2) once the state store is fully populated on the new node, only then > rebalance the tasks - there will still be a rebalance pause, but would be > greatly reduced > Relates to: KAFKA-6144 - Allow state stores to serve stale reads during > rebalance -- This message was sent by Atlassian Jira (v8.3.4#803005)