Lucas Brutschy created KAFKA-19529:
--------------------------------------

             Summary: State updater does not always seem to remove all metrics
                 Key: KAFKA-19529
                 URL: https://issues.apache.org/jira/browse/KAFKA-19529
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 4.2.0
            Reporter: Lucas Brutschy


In long-running tests, streams sometimes fails with this exception:
{code:java}
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz:[2025-06-27 02:06:22,384] 
ERROR [i-0285945bbb1aa2330-StreamThread-3] stream-client [i-0285945bbb1aa2330] 
Encountered the following exception during processing and sent shutdown request 
for the entire application. (org.apache.kafka.streams.KafkaStreams)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-java.lang.IllegalArgumentException:
 A metric named 'MetricName [name=restore-records-rate, 
group=stream-state-updater-metrics, description=The average per-second number 
of records restored, tags={thread-id=i-0285945bbb1aa2330-StateUpdater-3}]' 
already exists, can't register another one.
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.common.metrics.Sensor.add(Sensor.java:344)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.common.metrics.Sensor.add(Sensor.java:317)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterMetrics.<init>(DefaultStateUpdater.java:1121)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.streams.processor.internals.DefaultStateUpdater$StateUpdaterThread.<init>(DefaultStateUpdater.java:96)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.streams.processor.internals.DefaultStateUpdater.start(DefaultStateUpdater.java:794)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.streams.processor.internals.TaskManager.init(TaskManager.java:154)
/mnt/run/streams/logs/streams.log.2025-06-27-02:59.gz-    at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:885)
 {code}
 
 * This seems to happen, when we restart a stream thread, e.g. the stream 
thread is removed at some point and then restarted at a later point.
 * We see log lines before / after the code that removes metrics from a state 
updater thread
 * Every time, it is a different state updater metric that seems to crash, so 
it seems to be some kind of race condition
 * I found a recent change in state updater initialization: 
[https://github.com/apache/kafka/pull/19889] . I have digged into it, and also 
asked Matthias/Bruno to dig into it, but we couldn’t find a reason why this 
could cause any problem.

We have reverted PR #19889 in 4.1 and this seems to have resolved the problem 
there. The problem still persists on trunk. We should either revert #19889 on 
trunk as well, or fix the PR.

 

cc [~cadonna] [~bbejeck]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to