[ 
https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146056#comment-16146056
 ] 

Matthias J. Sax commented on KAFKA-5786:
----------------------------------------

There was a second PR for 0.11.0 branch 
(https://github.com/apache/kafka/pull/3622) This fix will be included in 
0.11.0.1 release that should be available shortly (I assume next week).

Otherwise, you can get release code on Github via tags (ie, if you want to 
switch branch, you can also select "tag"). For 0.11.0.0 you end up here: 
https://github.com/apache/kafka/tree/0.11.0.0

In general I do agree that a rebalance error should not be fatal. However, the 
log does not show why the rebalance did fail (ie, root cause). Note, that a 
rebalance in Kafka Streams is more heavy weight than for a plain consumer as it 
also involves state handling/migration. There are some cases, that we do treat 
as fatal though. If a rebalance error is fatal, all thread will die eventually 
and your whole application will stop (as become a zombie). Not, that you should 
register an uncaught exception handler to get notified about this in your 
application code to take appropriate actions.

Btw: We do have a current dev roadmap to make Streams more robust and 
self-healing for this kind of exceptions, so stay tuned for 1.0 release in 
October. If you can provide more detailed logs, we can dig deeper into this 
issue to see why we did treat the error as fatal for your specific case.

> Yet another exception is causing that streamming app is zombie
> --------------------------------------------------------------
>
>                 Key: KAFKA-5786
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5786
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Seweryn Habdank-Wojewodzki
>            Priority: Critical
>
> Not handled exception in streamming app causes zombie state of the process.
> {code}
> 2017-08-24 15:17:40 WARN  StreamThread:978 - stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] 
> Unexpected state transition from RUNNING to DEAD.
> 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: 
> stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed 
> to rebalance.; 
> [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)]
>  in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3
> {code}
> The final state of the app is similar to KAFKA-5779, but the exception and 
> its location is in different place.
> The exception shall be handled in the way that either application tries to 
> continue working or shall completely quit if the error is not recoverable.
> Current situation when application is zombie is not good.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to