[ https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147838#comment-16147838 ]
Matthias J. Sax commented on KAFKA-5786: ---------------------------------------- Thanks for the logs: If I read them correctly, some of your threads misses a rebalance due to long state recreation in a previous rebalance. Thus, they drop out of the consumer group without noticing in the first place. Thus, when the next rebalance happens, they try to commit but fail, as they are not part of the group any longer. This issues should be mitigated by KAFKA-5152 -- nevertheless, a proper fix would be to not let the thread die in the first place. We do have a JIRA for this already: KAFKA-5541 I am going to close this as a duplicate. In 0.11.0.1, the probability that you hit this issues should be reduced (via KAFKA-5152), and I hope to get KAFKA-5541 into 1.0 that should deliver the proper fix. Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further improvements on internal exception handling. > Yet another exception is causing that streamming app is zombie > -------------------------------------------------------------- > > Key: KAFKA-5786 > URL: https://issues.apache.org/jira/browse/KAFKA-5786 > Project: Kafka > Issue Type: Bug > Reporter: Seweryn Habdank-Wojewodzki > Attachments: fatal-errors-by-rebalancing.zip > > > Not handled exception in streamming app causes zombie state of the process. > {code} > 2017-08-24 15:17:40 WARN StreamThread:978 - stream-thread > [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] > Unexpected state transition from RUNNING to DEAD. > 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: > stream-thread > [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed > to rebalance.; > [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589), > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553), > > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)] > in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3 > {code} > The final state of the app is similar to KAFKA-5779, but the exception and > its location is in different place. > The exception shall be handled in the way that either application tries to > continue working or shall completely quit if the error is not recoverable. > Current situation when application is zombie is not good. -- This message was sent by Atlassian JIRA (v6.4.14#64029)