[ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076870#comment-17076870
 ] 

ASF GitHub Bot commented on KAFKA-6145:
---------------------------------------

ableegoldman commented on pull request #8436: KAFKA-6145: KIP-441 avoid 
unnecessary movement of standbys
URL: https://github.com/apache/kafka/pull/8436
 
 
   Currently we add warmup and standby tasks, meaning we first assign up to 
max.warmup.replica warmup tasks, and then attempt to assign num.standby copies 
of each stateful task. This can cause unnecessary transient standbys to pop up 
for the lifetime of the warmup task, which are presumably not what the user 
wanted.
   
   Note that we don’t want to simply count all warmups against the configured 
num.standbys, as this may cause the opposite problem where a standby we intend 
to keep is temporarily unassigned (which may lead to the cleanup thread 
deleting it). We should only count this as a standby if the destination client 
already had this task as a standby; otherwise, the standby already exists on 
some other client, so we should aim to give it back.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-6145
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6145
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Antony Stubbs
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>              Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to