[ 
https://issues.apache.org/jira/browse/KAFKA-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-17954:
------------------------------------
    Affects Version/s: 3.8.0
                           (was: 3.8.1)

> Error getting oldest-iterator-open-since-ms from JMX
> ----------------------------------------------------
>
>                 Key: KAFKA-17954
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17954
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.8.0
>            Reporter: Nicholas Telford
>            Assignee: Nicholas Telford
>            Priority: Minor
>             Fix For: 3.8.2, 3.9.1, 4.0.0
>
>
> In 
> [KIP-989|https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+Improved+StateStore+Iterator+metrics+for+detecting+leaks]
>  we introduced a new metric, {{{}oldest-iterator-open-since-ms{}}}, which 
> reports the timestamp that the oldest currently open KeyValueIterator was 
> opened at.
> On-scrape, we sometimes see this {{WARN}} log message:
> {noformat}
> Error getting JMX attribute 'oldest-iterator-open-since-ms'
> java.util.NoSuchElementException
> at 
> java.base/java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:1859)
> at 
> java.base/java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at 
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$registerMetrics$5(MeteredKeyValueStore.java:179){noformat}
> -However, if no iterators are currently open, this Gauge returns 
> {{{}null{}}}.-
> -When using the Prometheus {{JmxScraper}} to scrape this metric, its value is 
> added to a {{{}ConcurrentHashMap{}}}, which does _not_ permit {{null}} 
> values.-
> -We should find some other way to report the absence of this metric that does 
> not cause problems with {{{}ConcurrentHashMap{}}}.-
> My initial analysis was incorrect. The problem appears to be caused by the 
> {{openIterators}} Set in {{{}MeteredKeyValueStore{}}}:
> {noformat}
>     protected NavigableSet<MeteredIterator> openIterators = new 
> ConcurrentSkipListSet<>(Comparator.comparingLong(MeteredIterator::startTimestamp));
>  {noformat}
> This is used by the Gauge to report the metric:
> {noformat}
> openIterators.isEmpty() ? null : openIterators.first().startTimestamp() 
> {noformat}
> The source of the exception is the right-hand side of this ternary 
> expression, specifically {{{}openIterators.first(){}}}.
> The condition of this expression should ensure that there is at least one 
> element to retrieve by the right-hand side. *However, if the last Iterator is 
> removed from this Set concurrently to the Gauge being reported, after the 
> emptiness check, but before retrieving the element, we can throw the above 
> exception here.*
> This can happen because interactive queries and stream threads operate 
> concurrently from the thread that reads the Gauge to report metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to