[ https://issues.apache.org/jira/browse/KAFKA-5571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eno Thereska resolved KAFKA-5571. --------------------------------- Resolution: Fixed > Possible deadlock during shutdown in setState in kafka streams 10.2 > ------------------------------------------------------------------- > > Key: KAFKA-5571 > URL: https://issues.apache.org/jira/browse/KAFKA-5571 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1 > Reporter: Greg Fodor > Assignee: Eno Thereska > Attachments: kafka-streams.deadlock.log > > > I'm running a 10.2 job across 5 nodes with 32 stream threads on each node and > find that when gracefully shutdown all of them at once via an ansible > scripts, some of the nodes end up freezing -- at a glance the attached thread > dump implies a deadlock between stream threads trying to update their state > via setState. We haven't had this problem before but it may or may not be > related to changes in 10.2 (we are upgrading from 10.0 to 10.2) > when we gracefully shutdown all nodes simultaneously, what typically happens > is some subset of the nodes end up not shutting down completely but end up > going through a rebalance first. it seems this deadlock requires this > rebalancing to occur simultaneously with the graceful shutdown. if we happen > to shut them down and no rebalance happens, i don't believe this deadlock is > triggered. > the deadlock appears related to the state change handlers being subscribed > across threads and the fact that both StreamThread#setState and > StreamStateListener#onChange are both synchronized methods. > Another thing worth mentioning is that one of the transformers used in the > job has a close() method that can take 10-15 seconds to finish since it needs > to flush some data to a database. Having a long close() method combined with > a rebalance during a shutdown across many threads may be necessary for > reproduction. -- This message was sent by Atlassian JIRA (v6.4.14#64029)