Hi Lucas, Hmm, I'm not sure how we could reliably identify such leaked Iterators. If we tried to include open iterators when calculating iterator-duration, we'd need some kind of registry of all the open iterator creation timestamps, wouldn't we?
In general, if you have a leaky Iterator, it should manifest as a persistently climbing "open-iterators" metric, even on a busy node, because each time that Iterator is used, it will leak another one. So even in the presence of many non-leaky Iterators on a busy instance, the metric should still consistently climb. Regards, Nick On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy <lbruts...@confluent.io.invalid> wrote: > Hi Nick! > > thanks for the KIP! I think this could be quite useful, given the > problems that we had with leaks due to RocksDB resources not being > closed. > > I don't have any pressing issues why we can't accept it like it is, > just one minor point for discussion: would it also make sense to make > it possible to identify a few very long-running / leaked iterators? I > can imagine on a busy node, it would be hard to spot that 1% of the > iterators never close when looking only at closed iterator or the > number of iterators. But it could still be good to identify those > leaks early. One option would be to add `iterator-duration-max` and > take open iterators into account when computing the metric. > > Cheers, > Lucas > > > On Thu, Oct 5, 2023 at 3:50 PM Nick Telford <nick.telf...@gmail.com> > wrote: > > > > Hi Colt, > > > > I kept the details out of the KIP, because KIPs generally document > > high-level design, but the way I'm doing it is like this: > > > > final ManagedKeyValueIterator<Bytes, byte[]> > > rocksDbPrefixSeekIterator = cf.prefixScan(accessor, prefixBytes); > > --> final long startedAt = System.nanoTime(); > > openIterators.add(rocksDbPrefixSeekIterator); > > rocksDbPrefixSeekIterator.onClose(() -> { > > --> metricsRecorder.recordIteratorDuration(System.nanoTime() - > > startedAt); > > openIterators.remove(rocksDbPrefixSeekIterator); > > }); > > > > The lines with the arrow are the new code. This pattern is repeated > > throughout RocksDBStore, wherever a new RocksDbIterator is created. > > > > Regards, > > Nick > > > > On Thu, 5 Oct 2023 at 12:32, Colt McNealy <c...@littlehorse.io> wrote: > > > > > Thank you for the KIP, Nick! > > > > > > This would be highly useful for many reasons. Much more sane than > checking > > > for leaked iterators by profiling memory usage while running 100's of > > > thousands of range scans via interactive queries (: > > > > > > One small question: > > > > > > >The iterator-duration metrics will be updated whenever an Iterator's > > > close() method is called > > > > > > Does the Iterator have its own "createdAt()" or equivalent field, or > do we > > > need to keep track of the Iterator's start time upon creation? > > > > > > Cheers, > > > Colt McNealy > > > > > > *Founder, LittleHorse.dev* > > > > > > > > > On Wed, Oct 4, 2023 at 9:07 AM Nick Telford <nick.telf...@gmail.com> > > > wrote: > > > > > > > Hi everyone, > > > > > > > > KIP-989 is a small Kafka Streams KIP to add a few new metrics around > the > > > > creation and use of RocksDB Iterators, to aid users in identifying > > > > "Iterator leaks" that could cause applications to leak native memory. > > > > > > > > Let me know what you think! > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics > > > > > > > > P.S. I'm not too sure about the formatting of the "New Metrics" > table, > > > any > > > > advice there would be appreciated. > > > > > > > > Regards, > > > > Nick > > > > > > > >