I'm running a 10.2 job across 5 nodes with 32 stream threads on each node
and find that when gracefully shutdown all of them at once via an ansible
scripts, some of the nodes end up freezing -- at a glance the attached
thread dump implies a deadlock between stream threads trying to update
their state via setState. We haven't had this problem before but it may or
may not be related to changes in 10.2 (we are upgrading from 10.0 to 10.2)

when we gracefully shutdown all nodes simultaneously, what typically
happens is some subset of the nodes end up not shutting down completely but
end up going through a rebalance first. it seems this deadlock requires
this rebalancing to occur simultaneously with the graceful shutdown. if we
happen to shut them down and no rebalance happens, i don't believe this
deadlock is triggered.

the deadlock appears related to the state change handlers being subscribed
across threads and the fact that both StreamThread#setState and
StreamStateListener#onChange are both synchronized methods.

Reply via email to