[ 
https://issues.apache.org/jira/browse/KAFKA-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147838#comment-16147838
 ] 

Matthias J. Sax commented on KAFKA-5786:
----------------------------------------

Thanks for the logs: If I read them correctly, some of your threads misses a 
rebalance due to long state recreation in a previous rebalance. Thus, they drop 
out of the consumer group without noticing in the first place. Thus, when the 
next rebalance happens, they try to commit but fail, as they are not part of 
the group any longer. This issues should be mitigated by KAFKA-5152 -- 
nevertheless, a proper fix would be to not let the thread die in the first 
place. We do have a JIRA for this already: KAFKA-5541

I am going to close this as a duplicate. In 0.11.0.1, the probability that you 
hit this issues should be reduced (via KAFKA-5152), and I hope to get 
KAFKA-5541 into 1.0 that should deliver the proper fix.

Thanks for reporting the issue! Btw: you can also follow KAFKA-5156 for further 
improvements on internal exception handling.

> Yet another exception is causing that streamming app is zombie
> --------------------------------------------------------------
>
>                 Key: KAFKA-5786
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5786
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Seweryn Habdank-Wojewodzki
>         Attachments: fatal-errors-by-rebalancing.zip
>
>
> Not handled exception in streamming app causes zombie state of the process.
> {code}
> 2017-08-24 15:17:40 WARN  StreamThread:978 - stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] 
> Unexpected state transition from RUNNING to DEAD.
> 2017-08-24 15:17:40 FATAL StreamProcessor:67 - Caught unhandled exception: 
> stream-thread 
> [kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3] Failed 
> to rebalance.; 
> [org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:589),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:553),
>  
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:527)]
>  in thread kafka-endpoint-1236e6d5-75f0-4c14-b025-78e632484a26-StreamThread-3
> {code}
> The final state of the app is similar to KAFKA-5779, but the exception and 
> its location is in different place.
> The exception shall be handled in the way that either application tries to 
> continue working or shall completely quit if the error is not recoverable.
> Current situation when application is zombie is not good.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to