[ 
https://issues.apache.org/jira/browse/KAFKA-17040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904232#comment-17904232
 ] 

Lianet Magrans edited comment on KAFKA-17040 at 12/9/24 6:02 PM:
-----------------------------------------------------------------

Hey [~apoorvmittal10] , in case it helps, I believe this issue happens when the 
consumer close cannot wait for the network thread to close (ex. close with low 
timeout or interrupted). This flow:
 # async consumer app thread triggers action to close network thread, and block 
until it completes (won't wait if interrupted or low timeout) 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1323]
 # async consumer app thread moves on and closes the telemetry reporter 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
 # background thread still running the network thread close, makes it to the 
point where it polls the client to send the unsent requests it has before 
closing 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L309]
 

with that sequence, we would end up trying to update the telemetry reporter 
that is already TERMINATED I expect, so this line would throw when we poll the 
network client:

[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L643]
 

Makes sense? I was taking a look at a flaky test we have with interrupt and saw 
this error a lot, it may help here: 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L834]
  


was (Author: JIRAUSER300183):
Hey [~apoorvmittal10] , in case it helps, I believe this issue happens when the 
consumer close cannot wait for the network thread to close (ex. close with low 
timeout or interrupted). This flow:
 # async consumer app thread triggers action to close network thread, and block 
until it completes (won't wait if interrupted or low timeout) 
https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1323
 # async consumer app thread moves on and closes the telemetry reporter 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L1335]
 # background thread still running the network thread close, makes it to the 
point where it polls the client to send the unsent requests it has before 
closing 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerNetworkThread.java#L309]
 

with that sequence, we would end up trying to update the telemetry reporter 
that is already TERMINATED I expect. Makes sense? I was taking a look at a 
flaky test we have with interrupt and saw this error a lot, it may help here: 
[https://github.com/apache/kafka/blob/3a9777a667620c5f926176452744c751df4dac17/core/src/test/scala/integration/kafka/api/PlaintextConsumerTest.scala#L834]
  

> Unknown telemetry state: TERMINATED thrown when closing AsyncKafkaConsumer
> --------------------------------------------------------------------------
>
>                 Key: KAFKA-17040
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17040
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, metrics
>    Affects Versions: 3.9.0
>            Reporter: Kirk True
>            Assignee: Apoorv Mittal
>            Priority: Major
>
> An error is occasionally thrown when closing the {{{}AsyncKafkaConsumer{}}}:
> {noformat}
> [ERROR] 2024-06-20 17:13:54,121 [consumer_background_thread] 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread 
> lambda$configureThread$0 - Uncaught exception in thread 
> 'consumer_background_thread':
> java.lang.IllegalStateException: Unknown telemetry state: TERMINATED
>         at 
> org.apache.kafka.common.telemetry.internals.ClientTelemetryReporter$DefaultClientTelemetrySender.timeToNextUpdate(ClientTelemetryReporter.java:363)
>         at 
> org.apache.kafka.clients.NetworkClient$TelemetrySender.maybeUpdate(NetworkClient.java:1392)
>         at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:668)
>         at 
> org.apache.kafka.clients.consumer.internals.NetworkClientDelegate.poll(NetworkClientDelegate.java:143)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.sendUnsentRequests(ConsumerNetworkThread.java:299)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.cleanup(ConsumerNetworkThread.java:318)
>         at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkThread.run(ConsumerNetworkThread.java:105){noformat}
> The issue appears to be that the {{TERMINATED}} state is not expected in the 
> switch statement inside 
> [timeToNextUpdate()|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/telemetry/internals/ClientTelemetryReporter.java#L307].
> As an aside, the error message might make more sense to be written as 
> "{_}Unexpected{_} telemetry state" instead of "{_}Unknown{_} telemetry state" 
> since {{TERMINATED}} is a known state, but heretofore unexpected.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to