[jira] [Commented] (KAFKA-6645) Host Affinity to facilitate faster restarts of kafka streams applications

Guozhang Wang (JIRA) Wed, 14 Mar 2018 11:35:15 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-6645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399068#comment-16399068
 ]


Guozhang Wang commented on KAFKA-6645:
--------------------------------------

Hi Giridhar,

I think I understand your issue now. What you may have observed is may not be 
completely resolved by the stickiness behavior: when you are (re-)starting your 
application that have multiple instances, the coordinator may not wait enough 
long time to have every instance to join the consumer group before issuing the 
rebalance. More specifically in your case, although you may restart the 10 
machines at roughly the same time, there is still some window gap that some 
machines starts up earlier than others, so if only, say 5 machines are 
recognized by the coordinator in the first rebalance, it has to reassign some 
of the tasks of the other 5 machines to these existing 5 ones, causing some 
long restoration latency, and only after that a new rebalance will be triggered 
with all 10 machines up and running, and tasks will be reassigned back.

To remedy this issue, in 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-134%3A+Delay+initial+consumer+group+rebalance
 we introduced a new config on the broker side to wait for longer time before 
triggering a rebalance for a new group. You can try increasing that config 
value (default to 3 seconds) and see if it helps to wait for longer time to get 
all instances to join the group and hence let sticky assignor to make the ideal 
assignment.

> Host Affinity to facilitate faster restarts of kafka streams applications
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-6645
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6645
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Giridhar Addepalli
>            Priority: Major
>
> Since Kafka Streams applications have lot of state in the stores in general, 
> it would be good to remember the assignment of partitions to machines. So 
> that when whole application is restarted for some reason, there is a way to 
> use past assignment of partitions to machines and there won't be need to 
> build up whole state by reading off of changelog kafka topic. This would 
> result in faster start-up.
> Samza has support for Host Affinity 
> ([https://samza.apache.org/learn/documentation/0.14/yarn/yarn-host-affinity.html])
> KIP-54 
> ([https://cwiki.apache.org/confluence/display/KAFKA/KIP-54+-+Sticky+Partition+Assignment+Strategy)]
>  , handles cases where some members of consumer group goes down / comes up, 
> and KIP-54 ensures there is minimal diff between assignments before and after 
> rebalance. 
> But to handle whole restart use case, we need to remember past assignment 
> somewhere, and use it after restart.
> Please let us know if this is already solved problem / some cleaner way of 
> achieving this objective



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6645) Host Affinity to facilitate faster restarts of kafka streams applications

Reply via email to