Hi Nick, I did not think in detail about how to implement it, just about what metrics would be nice to have. You are right, we'd have to register/deregister the iterators during open/close. This would be more complicated to implement than the other metrics, but I do not see a fundamental problem with it.
As far as I understand, even a low number of leaked iterators can hurt RocksDB compaction significantly. So we may even want to detect if the iterators are opened by one-time / rare queries against the state store. But, as I said, that would be an addition and not a change of the current contents of the KIP, so I'd support the KIP moving forward even without this extension. Cheers, Lucas On Tue, Oct 17, 2023 at 3:45 PM Nick Telford <nick.telf...@gmail.com> wrote: > > Hi Lucas, > > Hmm, I'm not sure how we could reliably identify such leaked Iterators. If > we tried to include open iterators when calculating iterator-duration, we'd > need some kind of registry of all the open iterator creation timestamps, > wouldn't we? > > In general, if you have a leaky Iterator, it should manifest as a > persistently climbing "open-iterators" metric, even on a busy node, because > each time that Iterator is used, it will leak another one. So even in the > presence of many non-leaky Iterators on a busy instance, the metric should > still consistently climb. > > Regards, > > Nick > > On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy <lbruts...@confluent.io.invalid> > wrote: > > > Hi Nick! > > > > thanks for the KIP! I think this could be quite useful, given the > > problems that we had with leaks due to RocksDB resources not being > > closed. > > > > I don't have any pressing issues why we can't accept it like it is, > > just one minor point for discussion: would it also make sense to make > > it possible to identify a few very long-running / leaked iterators? I > > can imagine on a busy node, it would be hard to spot that 1% of the > > iterators never close when looking only at closed iterator or the > > number of iterators. But it could still be good to identify those > > leaks early. One option would be to add `iterator-duration-max` and > > take open iterators into account when computing the metric. > > > > Cheers, > > Lucas > > > > > > On Thu, Oct 5, 2023 at 3:50 PM Nick Telford <nick.telf...@gmail.com> > > wrote: > > > > > > Hi Colt, > > > > > > I kept the details out of the KIP, because KIPs generally document > > > high-level design, but the way I'm doing it is like this: > > > > > > final ManagedKeyValueIterator<Bytes, byte[]> > > > rocksDbPrefixSeekIterator = cf.prefixScan(accessor, prefixBytes); > > > --> final long startedAt = System.nanoTime(); > > > openIterators.add(rocksDbPrefixSeekIterator); > > > rocksDbPrefixSeekIterator.onClose(() -> { > > > --> metricsRecorder.recordIteratorDuration(System.nanoTime() - > > > startedAt); > > > openIterators.remove(rocksDbPrefixSeekIterator); > > > }); > > > > > > The lines with the arrow are the new code. This pattern is repeated > > > throughout RocksDBStore, wherever a new RocksDbIterator is created. > > > > > > Regards, > > > Nick > > > > > > On Thu, 5 Oct 2023 at 12:32, Colt McNealy <c...@littlehorse.io> wrote: > > > > > > > Thank you for the KIP, Nick! > > > > > > > > This would be highly useful for many reasons. Much more sane than > > checking > > > > for leaked iterators by profiling memory usage while running 100's of > > > > thousands of range scans via interactive queries (: > > > > > > > > One small question: > > > > > > > > >The iterator-duration metrics will be updated whenever an Iterator's > > > > close() method is called > > > > > > > > Does the Iterator have its own "createdAt()" or equivalent field, or > > do we > > > > need to keep track of the Iterator's start time upon creation? > > > > > > > > Cheers, > > > > Colt McNealy > > > > > > > > *Founder, LittleHorse.dev* > > > > > > > > > > > > On Wed, Oct 4, 2023 at 9:07 AM Nick Telford <nick.telf...@gmail.com> > > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > KIP-989 is a small Kafka Streams KIP to add a few new metrics around > > the > > > > > creation and use of RocksDB Iterators, to aid users in identifying > > > > > "Iterator leaks" that could cause applications to leak native memory. > > > > > > > > > > Let me know what you think! > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics > > > > > > > > > > P.S. I'm not too sure about the formatting of the "New Metrics" > > table, > > > > any > > > > > advice there would be appreciated. > > > > > > > > > > Regards, > > > > > Nick > > > > > > > > > > >