[jira] [Commented] (KAFKA-19678) Streams open iterator tracking has high contention on metrics lock

Matthias J. Sax (Jira) Tue, 21 Oct 2025 13:24:07 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-19678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18031538#comment-18031538
 ]


Matthias J. Sax commented on KAFKA-19678:
-----------------------------------------

{quote}ideally there would be a fix outside of each individual processor having 
workarounds
{quote}
I never disagreed about this – just wanted to get you out of the ditch, until 
we find a fix :) 

Glad you figures out the memory/metric leak thing, and happy to hear that the 
fix improves the situation... AK 4.1.1 should do out soon....

Interesting idea about making it a DEBUG level metric – could be a good 
solution in case we cannot figure out anything better. But would require a KIP 
I assume? [~bbejeck] wanted to work on this ticket. Let's hear from him. – 
Personally I would hope that we just find a good fix, even if I am not 100% 
sure what it could be – maybe something a lazy/delayed removal of the metric, 
that we would cancel if a new iterator comes in again?

> Streams open iterator tracking has high contention on metrics lock
> ------------------------------------------------------------------
>
>                 Key: KAFKA-19678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19678
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 4.1.0
>            Reporter: Steven Schlansker
>            Assignee: Bill Bejeck
>            Priority: Major
>         Attachments: image-2025-09-05-12-13-24-910.png, 
> image-2025-10-20-13-36-54-857.png, image-2025-10-21-09-24-02-505.png
>
>
> We run Kafka Streams 4.1.0 with custom processors that heavily use state 
> store range iterators.
> While investigating disappointing performance, we found a surprising source 
> of lock contention.
> Over the course of about a 1 minute profiler sample, the 
> {{org.apache.kafka.common.metrics.Metrics}} lock is taken approximately 
> 40,000 times and blocks threads for about 1 minute.
> This appears to be because our state stores generally have no iterators open, 
> except when their processor is processing a record, in which case it opens an 
> iterator (taking the lock through {{OpenIterators.add}} into 
> {{{}Metrics.registerMetric{}}}), does a tiny bit of work, and then closes the 
> iterator (again taking the lock through {{OpenIterators.remove}} into 
> {{{}Metrics.removeMetric{}}}).
> So, stream processing threads takes a globally shared lock twice per record, 
> for this subset of our data. I've attached a profiler thread state 
> visualization with our findings - the red bar indicates the thread was 
> blocked during the sample on this lock. As you can see, this lock seems to be 
> severely hampering our performance.
>  
> !image-2025-09-05-12-13-24-910.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-19678) Streams open iterator tracking has high contention on metrics lock

Reply via email to