[ https://issues.apache.org/jira/browse/KAFKA-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193865#comment-16193865 ]
ASF GitHub Bot commented on KAFKA-5152: --------------------------------------- Github user guozhangwang closed the pull request at: https://github.com/apache/kafka/pull/3607 > Kafka Streams keeps restoring state after shutdown is initiated during startup > ------------------------------------------------------------------------------ > > Key: KAFKA-5152 > URL: https://issues.apache.org/jira/browse/KAFKA-5152 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Xavier Léauté > Assignee: Damian Guy > Priority: Blocker > Fix For: 0.11.0.1, 1.0.0 > > > If streams shutdown is initiated during state restore (e.g. an uncaught > exception is thrown) streams will not shut down until all stores are first > finished restoring. > As restore progresses, stream threads appear to be taken out of service as > part of the shutdown sequence, causing rebalancing of tasks. This compounds > the problem by slowing down the restore process even further, since the > remaining threads now have to also restore the reassigned tasks before they > can shut down. > A more severe issue is that if there is a new rebalance triggered during the > end of the waitingSync phase (e.g. due to a new member joining the group, or > some members timed out the SyncGroup response), then some consumer clients of > the group may already proceed with the {{onPartitionsAssigned}} and blocked > on trying to grab the file dir lock not yet released from other clients, > while the other clients holding the lock are consistently re-sending > {{JoinGroup}} requests while the rebalance cannot be completed because the > clients blocked on the file dir lock will not be kicked out of the group as > its heartbeat thread has been consistently sending HBRequest. Hence this is a > deadlock caused by not releasing the file dir locks in task suspension. -- This message was sent by Atlassian JIRA (v6.4.14#64029)