[ 
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17151456#comment-17151456
 ] 

Raman Gupta commented on KAFKA-10229:
-------------------------------------

[~guozhang] Its the latter case -- the app itself is running fine. I have both 
an uncaught exception handler and a streams state change listener defined. 
While the stream stops consuming, there is no transition of state change in the 
stream, nor any exceptions logged. Everything else in the app continues to run 
just fine, including other Kafka consumers. Its just the single stream that 
stops consuming.

The stream is not stateful -- its just a simple read, write some data to an 
external system, transform and write to another topic. I suppose its possible 
the write to the external system is hanging for some reason. Unfortunately that 
topics is "caught up" now so I'm not seeing this problem currently. However, 
next time it happens I'll take a thread dump and we can see what the stream 
threads are doing.

> Kafka stream dies for no apparent reason, no errors logged on client or server
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-10229
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10229
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.4.1
>            Reporter: Raman Gupta
>            Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I 
> have a Kafka stream with exactly once processing turned on. I also have an 
> uncaught exception handler defined on the client. I have a stream which I 
> noticed was lagging. Upon investigation, I see that the consumer group was 
> empty.
> On restarting the consumers, the consumer group re-established itself, but 
> after about 8 minutes, the group became empty again. There is nothing logged 
> on the client side about any stream errors, despite the existence of an 
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / 
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
> produs-cisFileIndexer-stream has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group produs-cisFileIndexer-stream in state PreparingRebalance with old 
> generation 228 (__consumer_offsets-3) (reason: removing member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why 
> this would be, logging shows that the stream was running and processing 
> messages normally and then just stopped processing anything about 4 minutes 
> before it dies, with no apparent errors or issues or anything logged via the 
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: 
> restarting the stream causes it to reprocess a bunch more messages from the 
> backlog, and then die again approximately 8 minutes later. At the time of the 
> last message consumed by the stream, there are no `INFO`-level or above logs 
> either in the client or the broker, or any errors whatsoever. The stream 
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single 
> consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to