[ https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143406#comment-16143406 ]
Seweryn Habdank-Wojewodzki commented on KAFKA-5786: --------------------------------------------------- IMHO no. It does not solve the problem. I have some question. Is the rebalancing error really fatal for the complete streamming application? If yes, application shall be terminated immediatelly. If no, exception shall be handled. I have a feeling that rebalancing error in not really fatal error, it can be only temporal or so. So in my opinion it is bug in API, and I saw similar errors in normal producer/consumer apps, where rebalancing error was not really fatal error. I will try to have a deeper look directly on the Kafka code. Perhaps I can suggest some patch - to practically explain what I mean. > Yet another exception is causing that streamming app is zombie > -------------------------------------------------------------- > > Key: KAFKA-5786 > URL: https://issues.apache.org/jira/browse/KAFKA-5786 > Project: Kafka > Issue Type: Bug > Reporter: Seweryn Habdank-Wojewodzki > Priority: Critical > > Not handled exception in streamming app causes zombie state of the process. > {code} > 2017-08-24 15:17:40 WARN StreamThread:978 - stream-thread > [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] > Unexpected state transition from RUNNING to DEAD. > 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: > stream-thread > [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed > to rebalance.; > [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589), > > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553), > > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)] > in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3 > {code} > The final state of the app is similar to KAFKA-5779, but the exception and > its location is in different place. > The exception shall be handled in the way that either application tries to > continue working or shall completely quit if the error is not recoverable. > Current situation when application is zombie is not good. -- This message was sent by Atlassian JIRA (v6.4.14#64029)