[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

ASF GitHub Bot (Jira) Thu, 02 Apr 2020 18:34:41 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-6145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074192#comment-17074192
 ]


ASF GitHub Bot commented on KAFKA-6145:
---------------------------------------

ableegoldman commented on pull request #8409: KAFKA-6145: KIP-441 Pt. 6 Trigger 
probing rebalances until group is stable
URL: https://github.com/apache/kafka/pull/8409
 
 
   This KIP is fairly straightforward, and does as the title describes by 
enforcing a rebalance once the configured `probing.rebalance.interval` has 
elapsed. However, we have had to modify the original plan in the KIP slightly 
to handle an edge case with static membership enabled:
   
   Since the group leader can crash and restart without triggering a rebalance, 
we can't rely on a purely in-memory flag/counter to keep track of these probing 
rebalances. We can instead rely on the assignment, encoding the upcoming 
probing rebalance in the `AssignmentInfo`. This is encoded as the time in ms of 
the next scheduled rebalance, ie it is set to `currentTimeMs + 
probingRebalanceIntervalMs` when the assignment is being generated. This 
anchors the probing rebalances to wall clock time, and ensures a pathologically 
failing member will not prevent the group from ever rebalancing
   
   We leave it up to a single member to be responsible for triggering the 
probing rebalances, and encode this for a single consumer on the leader's 
client (chosen arbitrarily). 
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Warm up new KS instances before migrating tasks - potentially a two phase 
> rebalance
> -----------------------------------------------------------------------------------
>
>                 Key: KAFKA-6145
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6145
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Antony Stubbs
>            Assignee: Sophie Blee-Goldman
>            Priority: Major
>              Labels: needs-kip
>
> Currently when expanding the KS cluster, the new node's partitions will be 
> unavailable during the rebalance, which for large states can take a very long 
> time, or for small state stores even more than a few ms can be a deal breaker 
> for micro service use cases.
> One workaround would be two execute the rebalance in two phases:
> 1) start running state store building on the new node
> 2) once the state store is fully populated on the new node, only then 
> rebalance the tasks - there will still be a rebalance pause, but would be 
> greatly reduced
> Relates to: KAFKA-6144 - Allow state stores to serve stale reads during 
> rebalance



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-6145) Warm up new KS instances before migrating tasks - potentially a two phase rebalance

Reply via email to