SGTM to use nano-seconds across the board.

On 5/16/24 7:12 AM, Nick Telford wrote:
Actually, one other point: our existing state store operation metrics are
measured in nanoseconds[1].

Should iterator-duration-(avg|max) also be measured in nanoseconds, for
consistency, or should we keep them milliseconds, as the KIP currently
states?

1:
https://docs.confluent.io/platform/current/streams/monitoring.html#state-store-metrics

On Thu, 16 May 2024 at 12:15, Nick Telford <nick.telf...@gmail.com> wrote:

Good point! I've updated it to "Improved StateStore Iterator metrics for
detecting leaks" - let me know if you have a better suggestion.

This should affect the voting imo, as nothing of substance has changed.

Regards,
Nick

On Thu, 16 May 2024 at 01:39, Sophie Blee-Goldman <sop...@responsive.dev>
wrote:

One quick thing -- can you update the title of this KIP to reflect the
decision to implement these metrics for all state stores implementations
rather than just RocksDB?


On Tue, May 14, 2024 at 1:36 PM Nick Telford <nick.telf...@gmail.com>
wrote:

Woops! Thanks for the catch Lucas. Given this was just a typo, I don't
think this affects the voting.

Cheers,
Nick

On Tue, 14 May 2024 at 18:06, Lucas Brutschy <lbruts...@confluent.io
.invalid>
wrote:

Hi Nick,

you are still referring to oldest-open-iterator-age-ms in the
`Proposed Changes` section.

Cheers,
Lucas

On Thu, May 2, 2024 at 4:00 PM Lucas Brutschy <lbruts...@confluent.io

wrote:

Hi Nick!

I agree, the age variant is a bit nicer since the semantics are very
clear from the name. If you'd rather go for the simple
implementation,
how about calling it `oldest-iterator-open-since-ms`? I believe this
could be understood without docs. Either way, I think we should be
able to open the vote for this KIP because nobody raised any major /
blocking concerns.

Looking forward to getting this voted on soon!

Cheers
Lucas

On Sun, Mar 31, 2024 at 5:23 PM Nick Telford <
nick.telf...@gmail.com>
wrote:

Hi Matthias,

For the oldest iterator metric, I would propose something simple
like
`iterator-opened-ms` and it would just be the actual timestamp
when
the
iterator was opened. I don't think we need to compute the actual
age,
but user can to this computation themselves?

That works for me; it's easier to implement like that :-D I'm a
little
concerned that the name "iterator-opened-ms" may not be obvious
enough
without reading the docs.

If we think reporting the age instead of just the timestamp is
better, I
would propose `iterator-max-age-ms`. I should be sufficient to
call
out
(as it's kinda "obvious" anyway) that the metric applies to open
iterator only.

While I think it's preferable to record the timestamp, rather than
the
age,
this does have the benefit of a more obvious metric name.

Nit: the KIP says it's a store-level metric, but I think it
would
be
good to say explicitly that it's recorded with DEBUG level only?

Yes, I've already updated the KIP with this information in the
table.

Regards,

Nick

On Sun, 31 Mar 2024 at 10:53, Matthias J. Sax <mj...@apache.org>
wrote:

The time window thing was just an idea. Happy to drop it.

For the oldest iterator metric, I would propose something simple
like
`iterator-opened-ms` and it would just be the actual timestamp
when
the
iterator was opened. I don't think we need to compute the actual
age,
but user can to this computation themselves?

If we think reporting the age instead of just the timestamp is
better, I
would propose `iterator-max-age-ms`. I should be sufficient to
call
out
(as it's kinda "obvious" anyway) that the metric applies to open
iterator only.

And yes, I was hoping that the code inside MetereXxxStore might
already
be setup in a way that custom stores would inherit the iterator
metrics
automatically -- I am just not sure, and left it as an exercise
for
somebody to confirm :)


Nit: the KIP says it's a store-level metric, but I think it
would
be
good to say explicitly that it's recorded with DEBUG level only?



-Matthias


On 3/28/24 2:52 PM, Nick Telford wrote:
Quick addendum:

My suggested metric "oldest-open-iterator-age-seconds" should
be
"oldest-open-iterator-age-ms". Milliseconds is obviously a
better
granularity for such a metric.

Still accepting suggestions for a better name.

On Thu, 28 Mar 2024 at 13:41, Nick Telford <
nick.telf...@gmail.com

wrote:

Hi everyone,

Sorry for leaving this for so long. So much for "3 weeks
until
KIP
freeze"!

On Sophie's comments:
1. Would Matthias's suggestion of a separate metric tracking
the
age of
the oldest open iterator (within the tag set) satisfy this?
That
way we
can
keep iterator-duration-(avg|max) for closed iterators, which
can
be
useful
for performance debugging for iterators that don't leak. I'm
not
sure
what
we'd call this metric, maybe:
"oldest-open-iterator-age-seconds"?
Seems
like a mouthful.

2. You're right, it makes more sense to provide
iterator-duration-(avg|max). Honestly, I can't remember why I
had
"total"
before, or why I was computing a rate-of-change over it.

3, 4, 5, 6. Agreed, I'll make all those changes as suggested.

7. Combined with Matthias's point about RocksDB, I'm
convinced
that this
is the wrong KIP for these. I'll introduce the additional
Rocks
metrics
in
another KIP.

On Matthias's comments:
A. Not sure about the time window. I'm pretty sure all
existing
avg/max
metrics are since the application was started? Any other
suggestions
here
would be appreciated.

B. Agreed. See point 1 above.

C. Good point. My focus was very much on Rocks memory leaks
when
I wrote
the first draft. I can generalise it. My only concern is
that it
might
make
it more difficult to detect Rocks iterator leaks caused
*within*
our
high-level iterator, e.g. RocksJNI, RocksDB, RocksDBStore,
etc.
But we
could always provide a RocksDB-specific metric for this, as
you
suggested.

D. Hmm, we do already have MeteredKeyValueIterator, which
automatically
wraps the iterator from inner-stores of
MeteredKeyValueStore. If
we
implemented these metrics there, then custom stores would
automatically
gain the functionality, right? This seems like a pretty
logical
place to
implement these metrics, since MeteredKeyValueStore is all
about
adding
metrics to state stores.

I imagine the best way to implement this would be to do so
at
the
high-level iterator rather than implementing it separately
for
each
specific iterator implementation for every store type.

Sophie, does MeteredKeyValueIterator fit with your
recommendation?

Thanks for your thoughts everyone, I'll update the KIP now.

Nick

On Thu, 14 Mar 2024 at 03:37, Sophie Blee-Goldman <
sop...@responsive.dev>
wrote:

About your last two points: I completely agree that we
should
try to
make this independent of RocksDB, and should probably adopt
a
general philosophy of being store-implementation agnostic
unless
there is good reason to focus on a particular store type:
eg if
it was
only possible to implement for certain stores, or only made
sense in
the context of a certain store type but not necessarily
stores
in
general.

While leaking memory due to unclosed iterators on RocksDB
stores
is
certainly the most common issue, I think Matthias
sufficiently
demonstrated that the problem of leaking iterators is not
actually
unique to RocksDB, and we should consider including
in-memory
stores at the very least. I also think that at this point,
we
may as
well
just implement the metrics for *all* store types, whether
rocksdb or
in-memory or custom. Not just because it probably applies to
all
store types (leaking iterators are rarely a good thing!) but
because
I imagine the best way to implement this would be to do so
at
the
high-level iterator rather than implementing it separately
for
each
specific iterator implementation for every store type.

That said, I haven't thought all that carefully about the
implementation
yet -- it just strikes me as easiest to do at the top level
of
the
store
hierarchy rather than at the bottom. My gut instinct may
very
well be
wrong, but that's what it's saying

On Thu, Mar 7, 2024 at 10:43 AM Matthias J. Sax <
mj...@apache.org>
wrote:

Seems I am late to this party. Can we pick this up again
aiming
for
3.8
release? I think it would be a great addition. Few
comments:


- I think it does make sense to report
`iterator-duration-avg`
and
`iterator-duration-max` for all *closed* iterators -- it
just
seems to
be a useful metric (wondering if this would be _overall_ or
bounded to
some time window?)

- About the duration iterators are currently open, I
believe
the only
useful way is to report the "oldest iterator", ie, the
smallest
iterator
open-time, of all currently open-iterator? We all agree
that
in
general,
leaking iterator would bump the count metric, and if there
is
a
few
ones
which are not closed and open for a long time, it seem
sufficient to
detect the single oldest one for alerting purpose?

- What I don't like about the KIP is it focus on RocksDB. I
don't
think
we should build on the internal RocksDB counters as
proposed
(I
guess,
we could still expose them, similar to other RocksDB
metrics
which we
expose already). However, for this new metric, we should
track
it
ourselves and thus make it independent of RocksDB -- in the
end, an
in-memory store could also leak memory (and kill a JVM
with an
out-of-memory error) and we should be able to track it.

- Not sure if we would like to add support for custom
stores,
to allow
them to register their iterators with this metric? Or would
this not
be
necessary, because custom stores could just register a
custom
metric
about it to begin with?



-Matthias

On 10/25/23 4:41 PM, Sophie Blee-Goldman wrote:

    If we used "iterator-duration-max", for
example, would it not be confusing that it includes
Iterators
that
are
still open, and therefore the duration is not yet known?


1. Ah, I think I understand your concern better now -- I
totally
agree
that
a
    "iterator-duration-max" metric would be
confusing/misleading. I
was
thinking about it a bit differently, something more akin
to
the
"last-rebalance-seconds-ago" consumer metric. As the name
suggests,
that basically just tracks how long the consumer has gone
without
rebalancing -- it doesn't purport to represent the actual
duration
between
rebalances, just the current time since the last one.  The
hard part
is
really
in choosing a name that reflects this -- maybe you have
some
better
ideas
but off the top of my head, perhaps something like
"iterator-lifetime-max"?

2. I'm not quite sure how to interpret the
"iterator-duration-total"
metric
-- what exactly does it mean to add up all the iterator
durations?
For
some context, while this is not a hard-and-fast rule, in
general
you'll
find that Kafka/Streams metrics tend to come in pairs of
avg/max or
rate/total. Something that you might measure the avg for
usually is
also useful to measure the max, whereas a total metric is
probably
also useful as a rate but not so much as an avg. I
actually
think
this
is part of why it feels like it makes so much sense to
include
a
"max"
version of this metric, as Lucas suggested, even if the
name
of
"iterator-duration-max" feels misleading. Ultimately the
metric names
are up to you, but for this reason, I would personally
advocate for
just going with an "iterator-duration-avg" and
"iterator-duration-max"

I did see your example in which you mention one could
monitor
the
rate of change of the "-total" metric. While this does
make
sense to
me, if the only way to interpret a metric is by computing
another
function over it, then why not just make that computation
the
metric
and cut out the middle man? And in this case, to me at
least,
it
feels
much easier to understand a metric like
"iterator-duration-max" vs
something like "iterator-duration-total-rate"

3. By the way, can you add another column to the table
with
the new
metrics
that lists the recording level? My suggestion would be to
put
the
"number-open-iterators" at INFO and the other two at
DEBUG.
See
the following for my reasoning behind this recommendation

4. I would change the "Type" entry for the
"number-open-iterators"
from
"Value" to "Gauge". This helps justify the "INFO" level
for
this
metric,
since unlike the other metrics which are "Measurables",
the
current
timestamp won't need to be retrieved on each recording

5. Can you list the tags that would be associated with
each
of
these
metrics (either in the table, or separately above/below if
they will
be
the same for all)

6. Do you have a strong preference for the name
"number-open-iterators"
or would you be alright in shortening this to
"num-open-iterators"?
The
latter is more in line with the naming scheme used
elsewhere
in Kafka
for similar kinds of metrics, and a shorter name is always
nice.

7. With respect to the rocksdb cache metrics, those sound
useful but
if it was me, I would probably save them for a separate
KIP
mainly
just
because the KIP freeze deadline is in a few weeks, and I
wouldn't
want
to end up blocking all the new metrics just because there
was
ongoing
debate about a subset of them. That said, you do have 3
full
weeks,
so
I would hope that you could get both sets of metrics
agreed
upon in
that timeframe!


On Tue, Oct 24, 2023 at 6:35 AM Nick Telford <
nick.telf...@gmail.com

wrote:

I don't really have a problem with adding such a metric,
I'm
just
not
entirely sure how it would work. If we used
"iterator-duration-max",
for
example, would it not be confusing that it includes
Iterators
that
are
still open, and therefore the duration is not yet known?
When
graphing
that
over time, I suspect it would be difficult to understand.

3.
FWIW, this would still be picked up by "open-iterators",
since that
metric
is only decremented when Iterator#close is called (via
the
ManagedKeyValueIterator#onClose hook).

I'm actually considering expanding the scope of this KIP
slightly to
include improved Block Cache metrics, as my own memory
leak
investigations
have trended in that direction. Do you think the
following
metrics
should
be included in this KIP, or should I create a new KIP?

      - block-cache-index-usage (number of bytes occupied
by
index
blocks)
      - block-cache-filter-usage (number of bytes
occupied by
filter
blocks)

Regards,
Nick

On Tue, 24 Oct 2023 at 07:09, Sophie Blee-Goldman <
sop...@responsive.dev>
wrote:

I actually think we could implement Lucas' suggestion
pretty
easily
and
without too much additional effort. We have full control
over the
iterator
that is returned by the various range queries, so it
would
be easy
to
register a gauge metric for how long it has been since
the
iterator
was
created. Then we just deregister the metric when the
iterator is
closed.

With respect to how useful this metric would be, both
Nick
and
Lucas
have
made good points: I would agree that in general, leaking
iterators
would
mean an ever-increasing iterator count that should be
possible to
spot
without this. However, a few things to consider:

1. it's really easy to set up an alert based on some
maximum
threshold
of
how long an iterator should remain open for. It's
relatively
more
tricky
to
set up alerts based on the current count of open
iterators
and how
it
changes over time.
2. As Lucas mentioned, it only takes a few iterators to
wreak havoc
in
extreme cases. Sometimes more advanced applications end
up
with
just a
few
leaking iterators despite closing the majority of them.
I've
seen
this
happen just once personally, but it was driving everyone
crazy
until we
figured it out.
3. A metric for how long the iterator has been open
would
help to
identify
hanging iterators due to some logic where the iterator
is
properly
closed
but for whatever reason just isn't being advanced to the
end, and
thus
not
reached the iterator#close line of the user code. This
case
seems
difficult
to spot without the specific metric for iterator
lifetime
4. Lastly, I think you could argue that all of the above
are
fairly
advanced use cases, but this seems like a fairly
advanced
feature
already,
so it doesn't seem unreasonable to try and cover all the
bases.

All that said, my philosophy is that the KIP author gets
the
final
word
on
what to pull into scope as long as the proposal isn't
harming
anyone
without the extra feature/changes. So it's up to you
Nick
--  just
wanted
to add some context on how it could work, and why it
would
be
helpful

Thanks for the KIP!

On Wed, Oct 18, 2023 at 7:04 AM Lucas Brutschy
<lbruts...@confluent.io.invalid> wrote:

Hi Nick,

I did not think in detail about how to implement it,
just
about
what
metrics would be nice to have. You are right, we'd
have to
register/deregister the iterators during open/close.
This
would be
more complicated to implement than the other metrics,
but
I
do not
see
a fundamental problem with it.

As far as I understand, even a low number of leaked
iterators can
hurt
RocksDB compaction significantly. So we may even want
to
detect if
the
iterators are opened by one-time / rare queries against
the
state
store.

But, as I said, that would be an addition and not a
change
of the
current contents of the KIP, so I'd support the KIP
moving
forward
even without this extension.

Cheers, Lucas



On Tue, Oct 17, 2023 at 3:45 PM Nick Telford <
nick.telf...@gmail.com>
wrote:

Hi Lucas,

Hmm, I'm not sure how we could reliably identify such
leaked
Iterators.
If
we tried to include open iterators when calculating
iterator-duration,
we'd
need some kind of registry of all the open iterator
creation
timestamps,
wouldn't we?

In general, if you have a leaky Iterator, it should
manifest as a
persistently climbing "open-iterators" metric, even
on a
busy
node,
because
each time that Iterator is used, it will leak another
one.
So
even in
the
presence of many non-leaky Iterators on a busy
instance,
the
metric
should
still consistently climb.

Regards,

Nick

On Mon, 16 Oct 2023 at 14:24, Lucas Brutschy <
lbruts...@confluent.io
.invalid>
wrote:

Hi Nick!

thanks for the KIP! I think this could be quite
useful,
given
the
problems that we had with leaks due to RocksDB
resources
not
being
closed.

I don't have any pressing issues why we can't accept
it
like it
is,
just one minor point for discussion: would it also
make
sense to
make
it possible to identify a few very long-running /
leaked
iterators? I
can imagine on a busy node, it would be hard to spot
that
1% of
the
iterators never close when looking only at closed
iterator or
the
number of iterators. But it could still be good to
identify
those
leaks early. One option would be to add
`iterator-duration-max`
and
take open iterators into account when computing the
metric.

Cheers,
Lucas


On Thu, Oct 5, 2023 at 3:50 PM Nick Telford <
nick.telf...@gmail.com>
wrote:

Hi Colt,

I kept the details out of the KIP, because KIPs
generally
document
high-level design, but the way I'm doing it is like
this:

           final ManagedKeyValueIterator<Bytes,
byte[]>
rocksDbPrefixSeekIterator = cf.prefixScan(accessor,
prefixBytes);
-->     final long startedAt = System.nanoTime();

  openIterators.add(rocksDbPrefixSeekIterator);
           rocksDbPrefixSeekIterator.onClose(() -> {
-->

metricsRecorder.recordIteratorDuration(System.nanoTime()
-
startedAt);

  openIterators.remove(rocksDbPrefixSeekIterator);
           });

The lines with the arrow are the new code. This
pattern
is
repeated
throughout RocksDBStore, wherever a new
RocksDbIterator
is
created.

Regards,
Nick

On Thu, 5 Oct 2023 at 12:32, Colt McNealy <
c...@littlehorse.io

wrote:

Thank you for the KIP, Nick!

This would be highly useful for many reasons. Much
more
sane
than
checking
for leaked iterators by profiling memory usage
while
running
100's
of
thousands of range scans via interactive queries (:

One small question:

The iterator-duration metrics will be updated
whenever
an
Iterator's
close() method is called

Does the Iterator have its own "createdAt()" or
equivalent
field,
or
do we
need to keep track of the Iterator's start time
upon
creation?

Cheers,
Colt McNealy

*Founder, LittleHorse.dev*


On Wed, Oct 4, 2023 at 9:07 AM Nick Telford <
nick.telf...@gmail.com>
wrote:

Hi everyone,

KIP-989 is a small Kafka Streams KIP to add a few
new
metrics
around
the
creation and use of RocksDB Iterators, to aid
users
in
identifying
"Iterator leaks" that could cause applications to
leak
native
memory.

Let me know what you think!













https://cwiki.apache.org/confluence/display/KAFKA/KIP-989%3A+RocksDB+Iterator+Metrics

P.S. I'm not too sure about the formatting of the
"New
Metrics"
table,
any
advice there would be appreciated.

Regards,
Nick

















Reply via email to