Greg Fodor created KAFKA-5571:
---------------------------------
Summary: Possible deadlock during shutdown in setState in kafka
streams 10.2
Key: KAFKA-5571
URL: https://issues.apache.org/jira/browse/KAFKA-5571
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 0.10.2.1
Reporter: Greg Fodor
Attachments: kafka-streams.deadlock.log
I'm running a 10.2 job across 5 nodes with 32 stream threads on each node and
find that when gracefully shutdown all of them at once via an ansible scripts,
some of the nodes end up freezing -- at a glance the attached thread dump
implies a deadlock between stream threads trying to update their state via
setState. We haven't had this problem before but it may or may not be related
to changes in 10.2 (we are upgrading from 10.0 to 10.2)
when we gracefully shutdown all nodes simultaneously, what typically happens is
some subset of the nodes end up not shutting down completely but end up going
through a rebalance first. it seems this deadlock requires this rebalancing to
occur simultaneously with the graceful shutdown. if we happen to shut them down
and no rebalance happens, i don't believe this deadlock is triggered.
the deadlock appears related to the state change handlers being subscribed
across threads and the fact that both StreamThread#setState and
StreamStateListener#onChange are both synchronized methods.
Another thing worth mentioning is that one of the transformers used in the job
has a close() method that can take 10-15 seconds to finish since it needs to
flush some data to a database. Having a long close() method combined with a
rebalance during a shutdown across many threads may be necessary for
reproduction.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)