Hi Nick, you are still referring to oldest-open-iterator-age-ms in the `Proposed Changes` section.
Cheers, Lucas On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy <lbruts...@confluent.io> wrote: > > Hi Nick! > > I agree, the age variant is a bit nicer since the semantics are very > clear from the name. If you'd rather go for the simple implementation, > how about calling it `oldest-iterator-open-since-ms`? I believe this > could be understood without docs. Either way, I think we should be > able to open the vote for this KIP because nobody raised any major / > blocking concerns. > > Looking forward to getting this voted on soon! > > Cheers > Lucas > > On Sun, Mar 31, 2024 at 5:23 PM Nick Telford <nick.telf...@gmail.com> wrote: > > > > Hi Matthias, > > > > > For the oldest iterator metric, I would propose something simple like > > > `iterator-opened-ms` and it would just be the actual timestamp when the > > > iterator was opened. I don't think we need to compute the actual age, > > > but user can to this computation themselves? > > > > That works for me; it's easier to implement like that :-D I'm a little > > concerned that the name "iterator-opened-ms" may not be obvious enough > > without reading the docs. > > > > > If we think reporting the age instead of just the timestamp is better, I > > > would propose `iterator-max-age-ms`. I should be sufficient to call out > > > (as it's kinda "obvious" anyway) that the metric applies to open > > > iterator only. > > > > While I think it's preferable to record the timestamp, rather than the age, > > this does have the benefit of a more obvious metric name. > > > > > Nit: the KIP says it's a store-level metric, but I think it would be > > > good to say explicitly that it's recorded with DEBUG level only? > > > > Yes, I've already updated the KIP with this information in the table. > > > > Regards, > > > > Nick > > > > On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax <mj...@apache.org> wrote: > > > > > The time window thing was just an idea. Happy to drop it. > > > > > > For the oldest iterator metric, I would propose something simple like > > > `iterator-opened-ms` and it would just be the actual timestamp when the > > > iterator was opened. I don't think we need to compute the actual age, > > > but user can to this computation themselves? > > > > > > If we think reporting the age instead of just the timestamp is better, I > > > would propose `iterator-max-age-ms`. I should be sufficient to call out > > > (as it's kinda "obvious" anyway) that the metric applies to open > > > iterator only. > > > > > > And yes, I was hoping that the code inside MetereXxxStore might already > > > be setup in a way that custom stores would inherit the iterator metrics > > > automatically -- I am just not sure, and left it as an exercise for > > > somebody to confirm :) > > > > > > > > > Nit: the KIP says it's a store-level metric, but I think it would be > > > good to say explicitly that it's recorded with DEBUG level only? > > > > > > > > > > > > -Matthias > > > > > > > > > On 3/28/24 2:52 PM, Nick Telford wrote: > > > > Quick addendum: > > > > > > > > My suggested metric "oldest-open-iterator-age-seconds" should be > > > > "oldest-open-iterator-age-ms". Milliseconds is obviously a better > > > > granularity for such a metric. > > > > > > > > Still accepting suggestions for a better name. > > > > > > > > On Thu, 28 Mar 2024 at 13:41, Nick Telford <nick.telf...@gmail.com> > > > wrote: > > > > > > > >> Hi everyone, > > > >> > > > >> Sorry for leaving this for so long. So much for "3 weeks until KIP > > > freeze"! > > > >> > > > >> On Sophie's comments: > > > >> 1. Would Matthias's suggestion of a separate metric tracking the age of > > > >> the oldest open iterator (within the tag set) satisfy this? That way we > > > can > > > >> keep iterator-duration-(avg|max) for closed iterators, which can be > > > useful > > > >> for performance debugging for iterators that don't leak. I'm not sure > > > what > > > >> we'd call this metric, maybe: "oldest-open-iterator-age-seconds"? Seems > > > >> like a mouthful. > > > >> > > > >> 2. You're right, it makes more sense to provide > > > >> iterator-duration-(avg|max). Honestly, I can't remember why I had > > > "total" > > > >> before, or why I was computing a rate-of-change over it. > > > >> > > > >> 3, 4, 5, 6. Agreed, I'll make all those changes as suggested. > > > >> > > > >> 7. Combined with Matthias's point about RocksDB, I'm convinced that > > > >> this > > > >> is the wrong KIP for these. I'll introduce the additional Rocks metrics > > > in > > > >> another KIP. > > > >> > > > >> On Matthias's comments: > > > >> A. Not sure about the time window. I'm pretty sure all existing avg/max > > > >> metrics are since the application was started? Any other suggestions > > > here > > > >> would be appreciated. > > > >> > > > >> B. Agreed. See point 1 above. > > > >> > > > >> C. Good point. My focus was very much on Rocks memory leaks when I > > > >> wrote > > > >> the first draft. I can generalise it. My only concern is that it might > > > make > > > >> it more difficult to detect Rocks iterator leaks caused *within* our > > > >> high-level iterator, e.g. RocksJNI, RocksDB, RocksDBStore, etc. But we > > > >> could always provide a RocksDB-specific metric for this, as you > > > suggested. > > > >> > > > >> D. Hmm, we do already have MeteredKeyValueIterator, which automatically > > > >> wraps the iterator from inner-stores of MeteredKeyValueStore. If we > > > >> implemented these metrics there, then custom stores would automatically > > > >> gain the functionality, right? This seems like a pretty logical place > > > >> to > > > >> implement these metrics, since MeteredKeyValueStore is all about adding > > > >> metrics to state stores. > > > >> > > > >>> I imagine the best way to implement this would be to do so at the > > > >>> high-level iterator rather than implementing it separately for each > > > >>> specific iterator implementation for every store type. > > > >> > > > >> Sophie, does MeteredKeyValueIterator fit with your recommendation? > > > >> > > > >> Thanks for your thoughts everyone, I'll update the KIP now. > > > >> > > > >> Nick > > > >> > > > >> On Thu, 14 Mar 2024 at 03:37, Sophie Blee-Goldman < > > > sop...@responsive.dev> > > > >> wrote: > > > >> > > > >>> About your last two points: I completely agree that we should try to > > > >>> make this independent of RocksDB, and should probably adopt a > > > >>> general philosophy of being store-implementation agnostic unless > > > >>> there is good reason to focus on a particular store type: eg if it was > > > >>> only possible to implement for certain stores, or only made sense in > > > >>> the context of a certain store type but not necessarily stores in > > > general. > > > >>> > > > >>> While leaking memory due to unclosed iterators on RocksDB stores is > > > >>> certainly the most common issue, I think Matthias sufficiently > > > >>> demonstrated that the problem of leaking iterators is not actually > > > >>> unique to RocksDB, and we should consider including in-memory > > > >>> stores at the very least. I also think that at this point, we may as > > > well > > > >>> just implement the metrics for *all* store types, whether rocksdb or > > > >>> in-memory or custom. Not just because it probably applies to all > > > >>> store types (leaking iterators are rarely a good thing!) but because > > > >>> I imagine the best way to implement this would be to do so at the > > > >>> high-level iterator rather than implementing it separately for each > > > >>> specific iterator implementation for every store type. > > > >>> > > > >>> That said, I haven't thought all that carefully about the > > > implementation > > > >>> yet -- it just strikes me as easiest to do at the top level of the > > > store > > > >>> hierarchy rather than at the bottom. My gut instinct may very well be > > > >>> wrong, but that's what it's saying > > > >>> > > > >>> On Thu, Mar 7, 2024 at 10:43 AM Matthias J. Sax <mj...@apache.org> > > > wrote: > > > >>> > > > >>>> Seems I am late to this party. Can we pick this up again aiming for > > > 3.8 > > > >>>> release? I think it would be a great addition. Few comments: > > > >>>> > > > >>>> > > > >>>> - I think it does make sense to report `iterator-duration-avg` and > > > >>>> `iterator-duration-max` for all *closed* iterators -- it just seems > > > >>>> to > > > >>>> be a useful metric (wondering if this would be _overall_ or bounded > > > >>>> to > > > >>>> some time window?) > > > >>>> > > > >>>> - About the duration iterators are currently open, I believe the only > > > >>>> useful way is to report the "oldest iterator", ie, the smallest > > > iterator > > > >>>> open-time, of all currently open-iterator? We all agree that in > > > general, > > > >>>> leaking iterator would bump the count metric, and if there is a few > > > ones > > > >>>> which are not closed and open for a long time, it seem sufficient to > > > >>>> detect the single oldest one for alerting purpose? > > > >>>> > > > >>>> - What I don't like about the KIP is it focus on RocksDB. I don't > > > think > > > >>>> we should build on the internal RocksDB counters as proposed (I > > > >>>> guess, > > > >>>> we could still expose them, similar to other RocksDB metrics which we > > > >>>> expose already). However, for this new metric, we should track it > > > >>>> ourselves and thus make it independent of RocksDB -- in the end, an > > > >>>> in-memory store could also leak memory (and kill a JVM with an > > > >>>> out-of-memory error) and we should be able to track it. > > > >>>> > > > >>>> - Not sure if we would like to add support for custom stores, to > > > >>>> allow > > > >>>> them to register their iterators with this metric? Or would this not > > > be > > > >>>> necessary, because custom stores could just register a custom metric > > > >>>> about it to begin with? > > > >>>> > > > >>>> > > > >>>> > > > >>>> -Matthias > > > >>>> > > > >>>> On 10/25/23 4:41 PM, Sophie Blee-Goldman wrote: > > > >>>>>> > > > >>>>>> If we used "iterator-duration-max", for > > > >>>>>> example, would it not be confusing that it includes Iterators that > > > >>> are > > > >>>>>> still open, and therefore the duration is not yet known? > > > >>>>> > > > >>>>> > > > >>>>> 1. Ah, I think I understand your concern better now -- I totally > > > agree > > > >>>> that > > > >>>>> a > > > >>>>> "iterator-duration-max" metric would be confusing/misleading. I > > > was > > > >>>>> thinking about it a bit differently, something more akin to the > > > >>>>> "last-rebalance-seconds-ago" consumer metric. As the name suggests, > > > >>>>> that basically just tracks how long the consumer has gone without > > > >>>>> rebalancing -- it doesn't purport to represent the actual duration > > > >>>> between > > > >>>>> rebalances, just the current time since the last one. The hard part > > > >>> is > > > >>>>> really > > > >>>>> in choosing a name that reflects this -- maybe you have some better > > > >>> ideas > > > >>>>> but off the top of my head, perhaps something like > > > >>>> "iterator-lifetime-max"? > > > >>>>> > > > >>>>> 2. I'm not quite sure how to interpret the "iterator-duration-total" > > > >>>> metric > > > >>>>> -- what exactly does it mean to add up all the iterator durations? > > > For > > > >>>>> some context, while this is not a hard-and-fast rule, in general > > > >>> you'll > > > >>>>> find that Kafka/Streams metrics tend to come in pairs of avg/max or > > > >>>>> rate/total. Something that you might measure the avg for usually is > > > >>>>> also useful to measure the max, whereas a total metric is probably > > > >>>>> also useful as a rate but not so much as an avg. I actually think > > > this > > > >>>>> is part of why it feels like it makes so much sense to include a > > > "max" > > > >>>>> version of this metric, as Lucas suggested, even if the name of > > > >>>>> "iterator-duration-max" feels misleading. Ultimately the metric > > > >>>>> names > > > >>>>> are up to you, but for this reason, I would personally advocate for > > > >>>>> just going with an "iterator-duration-avg" and > > > "iterator-duration-max" > > > >>>>> > > > >>>>> I did see your example in which you mention one could monitor the > > > >>>>> rate of change of the "-total" metric. While this does make sense to > > > >>>>> me, if the only way to interpret a metric is by computing another > > > >>>>> function over it, then why not just make that computation the metric > > > >>>>> and cut out the middle man? And in this case, to me at least, it > > > feels > > > >>>>> much easier to understand a metric like "iterator-duration-max" vs > > > >>>>> something like "iterator-duration-total-rate" > > > >>>>> > > > >>>>> 3. By the way, can you add another column to the table with the new > > > >>>> metrics > > > >>>>> that lists the recording level? My suggestion would be to put the > > > >>>>> "number-open-iterators" at INFO and the other two at DEBUG. See > > > >>>>> the following for my reasoning behind this recommendation > > > >>>>> > > > >>>>> 4. I would change the "Type" entry for the "number-open-iterators" > > > >>> from > > > >>>>> "Value" to "Gauge". This helps justify the "INFO" level for this > > > >>> metric, > > > >>>>> since unlike the other metrics which are "Measurables", the current > > > >>>>> timestamp won't need to be retrieved on each recording > > > >>>>> > > > >>>>> 5. Can you list the tags that would be associated with each of these > > > >>>>> metrics (either in the table, or separately above/below if they will > > > >>> be > > > >>>>> the same for all) > > > >>>>> > > > >>>>> 6. Do you have a strong preference for the name > > > >>> "number-open-iterators" > > > >>>>> or would you be alright in shortening this to "num-open-iterators"? > > > >>> The > > > >>>>> latter is more in line with the naming scheme used elsewhere in > > > >>>>> Kafka > > > >>>>> for similar kinds of metrics, and a shorter name is always nice. > > > >>>>> > > > >>>>> 7. With respect to the rocksdb cache metrics, those sound useful but > > > >>>>> if it was me, I would probably save them for a separate KIP mainly > > > >>> just > > > >>>>> because the KIP freeze deadline is in a few weeks, and I wouldn't > > > want > > > >>>>> to end up blocking all the new metrics just because there was > > > >>>>> ongoing > > > >>>>> debate about a subset of them. That said, you do have 3 full weeks, > > > so > > > >>>>> I would hope that you could get both sets of metrics agreed upon in > > > >>>>> that timeframe! > > > >>>>> > > > >>>>> > > > >>>>> On Tue, Oct 24, 2023 at 6:35 AM Nick Telford <nick.telf...@gmail.com > > > > > > > >>>> wrote: > > > >>>>> > > > >>>>>> I don't really have a problem with adding such a metric, I'm just > > > not > > > >>>>>> entirely sure how it would work. If we used > > > >>>>>> "iterator-duration-max", > > > >>> for > > > >>>>>> example, would it not be confusing that it includes Iterators that > > > >>> are > > > >>>>>> still open, and therefore the duration is not yet known? When > > > >>> graphing > > > >>>> that > > > >>>>>> over time, I suspect it would be difficult to understand. > > > >>>>>> > > > >>>>>> 3. > > > >>>>>> FWIW, this would still be picked up by "open-iterators", since that > > > >>>> metric > > > >>>>>> is only decremented when Iterator#close is called (via the > > > >>>>>> ManagedKeyValueIterator#onClose hook). > > > >>>>>> > > > >>>>>> I'm actually considering expanding the scope of this KIP slightly > > > >>>>>> to > > > >>>>>> include improved Block Cache metrics, as my own memory leak > > > >>>> investigations > > > >>>>>> have trended in that direction. Do you think the following metrics > > > >>>> should > > > >>>>>> be included in this KIP, or should I create a new KIP? > > > >>>>>> > > > >>>>>> - block-cache-index-usage (number of bytes occupied by index > > > >>> blocks) > > > >>>>>> - block-cache-filter-usage (number of bytes occupied by filter > > > >>>> blocks) > > > >>>>>> > > > >>>>>> Regards, > > > >>>>>> Nick > > > >>>>>> > > > >>>>>> On Tue, 24 Oct 2023 at 07:09, Sophie Blee-Goldman < > > > >>>> sop...@responsive.dev> > > > >>>>>> wrote: > > > >>>>>> > > > >>>>>>> I actually think we could implement Lucas' suggestion pretty > > > >>>>>>> easily > > > >>> and > > > >>>>>>> without too much additional effort. We have full control over the > > > >>>>>> iterator > > > >>>>>>> that is returned by the various range queries, so it would be easy > > > >>> to > > > >>>>>>> register a gauge metric for how long it has been since the > > > >>>>>>> iterator > > > >>> was > > > >>>>>>> created. Then we just deregister the metric when the iterator is > > > >>>> closed. > > > >>>>>>> > > > >>>>>>> With respect to how useful this metric would be, both Nick and > > > Lucas > > > >>>> have > > > >>>>>>> made good points: I would agree that in general, leaking iterators > > > >>>> would > > > >>>>>>> mean an ever-increasing iterator count that should be possible to > > > >>> spot > > > >>>>>>> without this. However, a few things to consider: > > > >>>>>>> > > > >>>>>>> 1. it's really easy to set up an alert based on some maximum > > > >>> threshold > > > >>>> of > > > >>>>>>> how long an iterator should remain open for. It's relatively more > > > >>>> tricky > > > >>>>>> to > > > >>>>>>> set up alerts based on the current count of open iterators and how > > > >>> it > > > >>>>>>> changes over time. > > > >>>>>>> 2. As Lucas mentioned, it only takes a few iterators to wreak > > > >>>>>>> havoc > > > >>> in > > > >>>>>>> extreme cases. Sometimes more advanced applications end up with > > > >>> just a > > > >>>>>> few > > > >>>>>>> leaking iterators despite closing the majority of them. I've seen > > > >>> this > > > >>>>>>> happen just once personally, but it was driving everyone crazy > > > >>> until we > > > >>>>>>> figured it out. > > > >>>>>>> 3. A metric for how long the iterator has been open would help to > > > >>>>>> identify > > > >>>>>>> hanging iterators due to some logic where the iterator is properly > > > >>>> closed > > > >>>>>>> but for whatever reason just isn't being advanced to the end, and > > > >>> thus > > > >>>>>> not > > > >>>>>>> reached the iterator#close line of the user code. This case seems > > > >>>>>> difficult > > > >>>>>>> to spot without the specific metric for iterator lifetime > > > >>>>>>> 4. Lastly, I think you could argue that all of the above are > > > >>>>>>> fairly > > > >>>>>>> advanced use cases, but this seems like a fairly advanced feature > > > >>>>>> already, > > > >>>>>>> so it doesn't seem unreasonable to try and cover all the bases. > > > >>>>>>> > > > >>>>>>> All that said, my philosophy is that the KIP author gets the final > > > >>> word > > > >>>>>> on > > > >>>>>>> what to pull into scope as long as the proposal isn't harming > > > anyone > > > >>>>>>> without the extra feature/changes. So it's up to you Nick -- just > > > >>>> wanted > > > >>>>>>> to add some context on how it could work, and why it would be > > > >>> helpful > > > >>>>>>> > > > >>>>>>> Thanks for the KIP! > > > >>>>>>> > > > >>>>>>> On Wed, Oct 18, 2023 at 7:04 AM Lucas Brutschy > > > >>>>>>> <lbruts...@confluent.io.invalid> wrote: > > > >>>>>>> > > > >>>>>>>> Hi Nick, > > > >>>>>>>> > > > >>>>>>>> I did not think in detail about how to implement it, just about > > > >>> what > > > >>>>>>>> metrics would be nice to have. You are right, we'd have to > > > >>>>>>>> register/deregister the iterators during open/close. This would > > > >>>>>>>> be > > > >>>>>>>> more complicated to implement than the other metrics, but I do > > > >>>>>>>> not > > > >>> see > > > >>>>>>>> a fundamental problem with it. > > > >>>>>>>> > > > >>>>>>>> As far as I understand, even a low number of leaked iterators can > > > >>> hurt > > > >>>>>>>> RocksDB compaction significantly. So we may even want to detect > > > >>>>>>>> if > > > >>> the > > > >>>>>>>> iterators are opened by one-time / rare queries against the state > > > >>>>>>>> store. > > > >>>>>>>> > > > >>>>>>>> But, as I said, that would be an addition and not a change of the > > > >>>>>>>> current contents of the KIP, so I'd support the KIP moving > > > >>>>>>>> forward > > > >>>>>>>> even without this extension. > > > >>>>>>>> > > > >>>>>>>> Cheers, Lucas > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Tue, Oct 17, 2023 at 3:45 PM Nick Telford < > > > >>> nick.telf...@gmail.com> > > > >>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>> Hi Lucas, > > > >>>>>>>>> > > > >>>>>>>>> Hmm, I'm not sure how we could reliably identify such leaked > > > >>>>>> Iterators. > > > >>>>>>>> If > > > >>>>>>>>> we tried to include open iterators when calculating > > > >>>>>> iterator-duration, > > > >>>>>>>> we'd > > > >>>>>>>>> need some kind of registry of all the open iterator creation > > > >>>>>>> timestamps, > > > >>>>>>>>> wouldn't we? > > > >>>>>>>>> > > > >>>>>>>>> In general, if you have a leaky Iterator, it should manifest as > > > >>>>>>>>> a > > > >>>>>>>>> persistently climbing "open-iterators" metric, even on a busy > > > >>> node, > > > >>>>>>>> because > > > >>>>>>>>> each time that Iterator is used, it will leak another one. So > > > >>> even in > > > >>>>>>> the > > > >>>>>>>>> presence of many non-leaky Iterators on a busy instance, the > > > >>> metric > > > >>>>>>>> should > > > >>>>>>>>> still consistently climb. > > > >>>>>>>>> > > > >>>>>>>>> Regards, > > > >>>>>>>>> > > > >>>>>>>>> Nick > > > >>>>>>>>> > > > >>>>>>>>> On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy < > > > >>> lbruts...@confluent.io > > > >>>>>>>> .invalid> > > > >>>>>>>>> wrote: > > > >>>>>>>>> > > > >>>>>>>>>> Hi Nick! > > > >>>>>>>>>> > > > >>>>>>>>>> thanks for the KIP! I think this could be quite useful, given > > > the > > > >>>>>>>>>> problems that we had with leaks due to RocksDB resources not > > > >>> being > > > >>>>>>>>>> closed. > > > >>>>>>>>>> > > > >>>>>>>>>> I don't have any pressing issues why we can't accept it like it > > > >>> is, > > > >>>>>>>>>> just one minor point for discussion: would it also make sense > > > >>>>>>>>>> to > > > >>>>>> make > > > >>>>>>>>>> it possible to identify a few very long-running / leaked > > > >>>>>> iterators? I > > > >>>>>>>>>> can imagine on a busy node, it would be hard to spot that 1% of > > > >>> the > > > >>>>>>>>>> iterators never close when looking only at closed iterator or > > > the > > > >>>>>>>>>> number of iterators. But it could still be good to identify > > > those > > > >>>>>>>>>> leaks early. One option would be to add `iterator-duration-max` > > > >>> and > > > >>>>>>>>>> take open iterators into account when computing the metric. > > > >>>>>>>>>> > > > >>>>>>>>>> Cheers, > > > >>>>>>>>>> Lucas > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> On Thu, Oct 5, 2023 at 3:50 PM Nick Telford < > > > >>>>>> nick.telf...@gmail.com> > > > >>>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>> Hi Colt, > > > >>>>>>>>>>> > > > >>>>>>>>>>> I kept the details out of the KIP, because KIPs generally > > > >>>>>> document > > > >>>>>>>>>>> high-level design, but the way I'm doing it is like this: > > > >>>>>>>>>>> > > > >>>>>>>>>>> final ManagedKeyValueIterator<Bytes, byte[]> > > > >>>>>>>>>>> rocksDbPrefixSeekIterator = cf.prefixScan(accessor, > > > >>> prefixBytes); > > > >>>>>>>>>>> --> final long startedAt = System.nanoTime(); > > > >>>>>>>>>>> openIterators.add(rocksDbPrefixSeekIterator); > > > >>>>>>>>>>> rocksDbPrefixSeekIterator.onClose(() -> { > > > >>>>>>>>>>> --> > > > >>>>>>> metricsRecorder.recordIteratorDuration(System.nanoTime() > > > >>>>>>>> - > > > >>>>>>>>>>> startedAt); > > > >>>>>>>>>>> openIterators.remove(rocksDbPrefixSeekIterator); > > > >>>>>>>>>>> }); > > > >>>>>>>>>>> > > > >>>>>>>>>>> The lines with the arrow are the new code. This pattern is > > > >>>>>> repeated > > > >>>>>>>>>>> throughout RocksDBStore, wherever a new RocksDbIterator is > > > >>>>>> created. > > > >>>>>>>>>>> > > > >>>>>>>>>>> Regards, > > > >>>>>>>>>>> Nick > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Thu, 5 Oct 2023 at 12:32, Colt McNealy <c...@littlehorse.io > > > > > > > >>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Thank you for the KIP, Nick! > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> This would be highly useful for many reasons. Much more sane > > > >>>>>> than > > > >>>>>>>>>> checking > > > >>>>>>>>>>>> for leaked iterators by profiling memory usage while running > > > >>>>>>> 100's > > > >>>>>>>> of > > > >>>>>>>>>>>> thousands of range scans via interactive queries (: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> One small question: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> The iterator-duration metrics will be updated whenever an > > > >>>>>>>> Iterator's > > > >>>>>>>>>>>> close() method is called > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Does the Iterator have its own "createdAt()" or equivalent > > > >>>>>> field, > > > >>>>>>>> or > > > >>>>>>>>>> do we > > > >>>>>>>>>>>> need to keep track of the Iterator's start time upon > > > >>>>>>>>>>>> creation? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Cheers, > > > >>>>>>>>>>>> Colt McNealy > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> *Founder, LittleHorse.dev* > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On Wed, Oct 4, 2023 at 9:07 AM Nick Telford < > > > >>>>>>>> nick.telf...@gmail.com> > > > >>>>>>>>>>>> wrote: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>>> Hi everyone, > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> KIP-989 is a small Kafka Streams KIP to add a few new > > > >>>>>>>>>>>>> metrics > > > >>>>>>>> around > > > >>>>>>>>>> the > > > >>>>>>>>>>>>> creation and use of RocksDB Iterators, to aid users in > > > >>>>>>>> identifying > > > >>>>>>>>>>>>> "Iterator leaks" that could cause applications to leak > > > >>>>>>>>>>>>> native > > > >>>>>>>> memory. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Let me know what you think! > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>> > > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> P.S. I'm not too sure about the formatting of the "New > > > >>>>>> Metrics" > > > >>>>>>>>>> table, > > > >>>>>>>>>>>> any > > > >>>>>>>>>>>>> advice there would be appreciated. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>> Nick > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>> > > > >> > > > > > > >