[ https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076870#comment-17076870 ]
ASF GitHub Bot commented on KAFKA-6145: --------------------------------------- ableegoldman commented on pull request #8436: KAFKA-6145: KIP-441 avoid unnecessary movement of standbys URL: https://github.com/apache/kafka/pull/8436 Currently we add warmup and standby tasks, meaning we first assign up to max.warmup.replica warmup tasks, and then attempt to assign num.standby copies of each stateful task. This can cause unnecessary transient standbys to pop up for the lifetime of the warmup task, which are presumably not what the user wanted. Note that we don’t want to simply count all warmups against the configured num.standbys, as this may cause the opposite problem where a standby we intend to keep is temporarily unassigned (which may lead to the cleanup thread deleting it). We should only count this as a standby if the destination client already had this task as a standby; otherwise, the standby already exists on some other client, so we should aim to give it back. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Warm up new KS instances before migrating tasks - potentially a two phase > rebalance > ----------------------------------------------------------------------------------- > > Key: KAFKA-6145 > URL: https://issues.apache.org/jira/browse/KAFKA-6145 > Project: Kafka > Issue Type: New Feature > Components: streams > Reporter: Antony Stubbs > Assignee: Sophie Blee-Goldman > Priority: Major > Labels: needs-kip > > Currently when expanding the KS cluster, the new node's partitions will be > unavailable during the rebalance, which for large states can take a very long > time, or for small state stores even more than a few ms can be a deal breaker > for micro service use cases. > One workaround would be two execute the rebalance in two phases: > 1) start running state store building on the new node > 2) once the state store is fully populated on the new node, only then > rebalance the tasks - there will still be a rebalance pause, but would be > greatly reduced > Relates to: KAFKA-6144 - Allow state stores to serve stale reads during > rebalance -- This message was sent by Atlassian Jira (v8.3.4#803005)