Elias Levy created KAFKA-4748:
---------------------------------

             Summary: Need a way to shutdown all workers in a Streams 
application at the same time
                 Key: KAFKA-4748
                 URL: https://issues.apache.org/jira/browse/KAFKA-4748
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 0.10.1.1
            Reporter: Elias Levy


If you have a fleet of Stream workers for an application and attempt to shut 
them down simultaneously (e.g. via SIGTERM and 
Runtime.getRuntime().addShutdownHook() and streams.close())), a large number of 
the workers fail to shutdown.

The problem appears to be a race condition between the shutdown signal and the 
consumer rebalancing that is triggered by some of the workers existing before 
others.  Apparently, workers that receive the signal later fail to exit 
apparently as they are caught in the rebalance.

Terminating workers in a rolling fashion is not advisable in some situations.  
The rolling shutdown will result in many unnecessary rebalances and may fail, 
as the application may have large amount of local state that a smaller number 
of nodes may not be able to store.

It would appear that there is a need for a protocol change to allow the 
coordinator to signal a consumer group to shutdown without leading to 
rebalancing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to