[ 
https://issues.apache.org/jira/browse/KAFKA-19529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-19529:
------------------------------------
    Priority: Critical  (was: Major)

> State updater does not always seem to remove all metrics
> --------------------------------------------------------
>
>                 Key: KAFKA-19529
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19529
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 4.2.0
>            Reporter: Lucas Brutschy
>            Priority: Critical
>
> In long-running tests, streams sometimes fails with this exception:
> {code:java}
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz:[2025-06-27 
> 02:06:22,384] ERROR [i-0285945bbb1aa2330-StreamThread-3] stream-client 
> [i-0285945bbb1aa2330] Encountered the following exception during processing 
> and sent shutdown request for the entire application. 
> (org.apache.kafka.streams.KafkaStreams)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-java.lang.IllegalArgumentException:
>  A metric named 'MetricName [name=restore-records-rate, 
> group=stream-state-updater-metrics, description=The average per-second number 
> of records restored, tags={thread-id=i-0285945bbb1aa2330-StateUpdater-3}]' 
> already exists, can't register another one.
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.common.metrics.Sensor.add(Sensor.java:344)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.common.metrics.Sensor.add(Sensor.java:317)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterMetrics.<init>(DefaultStateUpdater.java:1121)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterThread.<init>(DefaultStateUpdater.java:96)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.streams.processor.internals.DefaultStateUpdater.start(DefaultStateUpdater.java:794)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.streams.processor.internals.TaskManager.init(TaskManager.java:154)
> /mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:885)
>  {code}
>  
>  * This seems to happen, when we restart a stream thread, e.g. the stream 
> thread is removed at some point and then restarted at a later point.
>  * We see log lines before / after the code that removes metrics from a state 
> updater thread
>  * Every time, it is a different state updater metric that seems to crash, so 
> it seems to be some kind of race condition
>  * I found a recent change in state updater initialization: 
> [https://github.com/apache/kafka/pull/19889] . I have digged into it, and 
> also asked Matthias/Bruno to dig into it, but we couldn’t find a reason why 
> this could cause any problem.
>  * What Matthias found out is that the problem seems to be on the 
> removal-side. I.e., the metric indeed doesn't seem to be removed correctly 
> when stopping the state updater.
> We have reverted PR #19889 in 4.1 and this seems to have resolved the problem 
> there. The problem still persists on trunk. We should either revert #19889 on 
> trunk as well, or fix the PR.
>  
> cc [~cadonna] [~bbejeck]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to