[
https://issues.apache.org/jira/browse/KAFKA-17954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias J. Sax updated KAFKA-17954:
------------------------------------
Affects Version/s: 3.8.0
(was: 3.8.1)
> Error getting oldest-iterator-open-since-ms from JMX
> ----------------------------------------------------
>
> Key: KAFKA-17954
> URL: https://issues.apache.org/jira/browse/KAFKA-17954
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.8.0
> Reporter: Nicholas Telford
> Assignee: Nicholas Telford
> Priority: Minor
> Fix For: 3.8.2, 3.9.1, 4.0.0
>
>
> In
> [KIP-989|https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+Improved+StateStore+Iterator+metrics+for+detecting+leaks]
> we introduced a new metric, {{{}oldest-iterator-open-since-ms{}}}, which
> reports the timestamp that the oldest currently open KeyValueIterator was
> opened at.
> On-scrape, we sometimes see this {{WARN}} log message:
> {noformat}
> Error getting JMX attribute 'oldest-iterator-open-since-ms'
> java.util.NoSuchElementException
> at
> java.base/java.util.concurrent.ConcurrentSkipListMap.firstKey(ConcurrentSkipListMap.java:1859)
> at
> java.base/java.util.concurrent.ConcurrentSkipListSet.first(ConcurrentSkipListSet.java:396)
> at
> org.apache.kafka.streams.state.internals.MeteredKeyValueStore.lambda$registerMetrics$5(MeteredKeyValueStore.java:179){noformat}
> -However, if no iterators are currently open, this Gauge returns
> {{{}null{}}}.-
> -When using the Prometheus {{JmxScraper}} to scrape this metric, its value is
> added to a {{{}ConcurrentHashMap{}}}, which does _not_ permit {{null}}
> values.-
> -We should find some other way to report the absence of this metric that does
> not cause problems with {{{}ConcurrentHashMap{}}}.-
> My initial analysis was incorrect. The problem appears to be caused by the
> {{openIterators}} Set in {{{}MeteredKeyValueStore{}}}:
> {noformat}
> protected NavigableSet<MeteredIterator> openIterators = new
> ConcurrentSkipListSet<>(Comparator.comparingLong(MeteredIterator::startTimestamp));
> {noformat}
> This is used by the Gauge to report the metric:
> {noformat}
> openIterators.isEmpty() ? null : openIterators.first().startTimestamp()
> {noformat}
> The source of the exception is the right-hand side of this ternary
> expression, specifically {{{}openIterators.first(){}}}.
> The condition of this expression should ensure that there is at least one
> element to retrieve by the right-hand side. *However, if the last Iterator is
> removed from this Set concurrently to the Gauge being reported, after the
> emptiness check, but before retrieving the element, we can throw the above
> exception here.*
> This can happen because interactive queries and stream threads operate
> concurrently from the thread that reads the Gauge to report metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)