Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Guozhang Wang Tue, 21 May 2019 09:01:07 -0700

Hello Bruno,

Thanks for the KIP, I have a few minor comments and a meta one which are
relatively aligned with other folks:


Minor:

1) Regarding the "rocksdb-state-id = [store ID]", to be consistent with
other state store metrics (see
https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams),
this tag should be either "rocksdb-window-state-id",
"rocksdb-session-state-id" or "rocksdb-state-id". For window / session
store backed by rocksDB, the tags should not be "rocksdb-state-id".

2) Also for window / session store, my take is that you plan to have
multiple rocksDB behind the scene to report to the same set of metrics, is
that right? My concern is that for such types of state stores, most of the
access would be on the first segment rocksDB instance, and hence coalescing
all of instances as a single set of metrics may dilute it.

3) I agree with @[email protected] <[email protected]> that we should
better have some documentation educating users what to do when see what
anomalies in metrics; though I think this is a long endeavoring effort that
may go beyond this KIP's scope, let's keep that as a separate umbrella JIRA
to accumulate knowledge over time.


Meta:

4) Instead of trying to enumerate all the ones that might be helpful, I'd
recommend that we expose a user-friendly API in StreamsMetrics to allow
users to add more metrics from RocksDB they'd like to have, while only
keeping a small set of most-meaningful ones that are ubiquitously useful
out-of-the-box. WDYT?


Guozhang



On Tue, May 21, 2019 at 8:04 AM Dongjin Lee <[email protected]> wrote:

> Hi Bruno,
>
> I just read the KIP. I think this feature is great. As far as I know, most
> Kafka users monitor the host resources, JVM resources, and Kafka metrics
> only, not RocksDB for configuring the statistics feature is a little bit
> tiresome. Since RocksDB impacts the performance of Kafka Streams, I believe
> providing these metrics with other metrics in one place is much better.
>
> However, I am a little bit not assured for how much information should be
> provided to the users with the documentation - how the user can control the
>  RocksDB may on the boundary of the scope. How do you think?
>
> +1. I entirely agree to Bill's comments; I also think `rocksdb-store-id` is
> better than `rocksdb-state-id` and metrics on total compactions and an
> average number of compactions is also needed.
>
> Regards,
> Dongjin
>
> On Tue, May 21, 2019 at 2:48 AM John Roesler <[email protected]> wrote:
>
> > Hi Bruno,
> >
> > Looks really good overall. This is going to be an awesome addition.
> >
> > My only thought was that we have "bytes-flushed-(rate|total) and
> > flush-time-(avg|min|max)" metrics, and the description states that
> > these are specifically for Memtable flush operations. What do you
> > think about calling it "memtable-bytes-flushed... (etc)"? On one hand,
> > I could see this being redundant, since that's the only thing that
> > gets "flushed" inside of Rocks. But on the other, we have an
> > independent "flush" operation in streams, which might be confusing.
> > Plus, it might help people who are looking at the full "menu" of
> > hundreds of metrics. They can't read and remember every description
> > while trying to understand the full list of metrics, so going for
> > maximum self-documentation in the name seems nice.
> >
> > But that's a minor thought. Modulo the others' comments, this looks good
> > to me.
> >
> > Thanks,
> > -John
> >
> > On Mon, May 20, 2019 at 11:07 AM Bill Bejeck <[email protected]> wrote:
> > >
> > > Hi Bruno,
> > >
> > > Thanks for the KIP, this will be a useful addition.
> > >
> > > Overall the KIP looks good to me, and I have two minor comments.
> > >
> > > 1. For the tags should, I'm wondering if rocksdb-state-id should be
> > > rocksdb-store-id
> > > instead?
> > >
> > > 2. With the compaction metrics, would it be possible to add total
> > > compactions and an average number of compactions?  I've taken a look at
> > the
> > > available RocksDB metrics, and I'm not sure.  But users can control how
> > > many L0 files it takes to trigger compaction so if it is possible; it
> may
> > > be useful.
> > >
> > > Thanks,
> > > Bill
> > >
> > >
> > > On Mon, May 20, 2019 at 9:15 AM Bruno Cadonna <[email protected]>
> > wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > Thank you for your comments.
> > > >
> > > > It's a good idea to supplement the metrics with configuration option
> > > > to change the metrics. I also had some thoughts about it. However, I
> > > > think I need some experimentation to get this right.
> > > >
> > > > I added the block cache hit rates for index and filter blocks to the
> > > > KIP. As far as I understood, they should stay at zero, if users do
> not
> > > > configure RocksDB to include index and filter blocks into the block
> > > > cache. Did you also understand this similarly? I guess also in this
> > > > case some experimentation would be good to be sure.
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > >
> > > > On Sat, May 18, 2019 at 2:29 AM Sophie Blee-Goldman <
> > [email protected]>
> > > > wrote:
> > > > >
> > > > > Actually I wonder if it might be useful to users to be able to
> break
> > up
> > > > the
> > > > > cache hit stats by type? Some people may choose to store index and
> > filter
> > > > > blocks alongside data blocks, and it would probably be very helpful
> > for
> > > > > them to know who is making more effective use of the cache in order
> > to
> > > > tune
> > > > > how much of it is allocated to each. I'm not sure how common this
> > really
> > > > is
> > > > > but I think it would be invaluable to those who do. RocksDB
> > performance
> > > > can
> > > > > be quite opaque..
> > > > >
> > > > > Cheers,
> > > > > Sophie
> > > > >
> > > > > On Fri, May 17, 2019 at 5:01 PM Sophie Blee-Goldman <
> > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hey Bruno!
> > > > > >
> > > > > > This all looks pretty good to me, but one suggestion I have is to
> > > > > > supplement each of the metrics with some info on how the user can
> > > > control
> > > > > > them. In other words, which options could/should they set in
> > > > > > RocksDBConfigSetter should they discover a particular bottleneck?
> > > > > >
> > > > > > I don't think this necessarily needs to go into the KIP, but I do
> > > > think it
> > > > > > should be included in the docs somewhere (happy to help build up
> > the
> > > > list
> > > > > > of associated options when the time comes)
> > > > > >
> > > > > > Thanks!
> > > > > > Sophie
> > > > > >
> > > > > > On Fri, May 17, 2019 at 2:54 PM Bruno Cadonna <
> [email protected]>
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> this KIP describes the extension of the Kafka Streams' metrics
> to
> > > > include
> > > > > >> RocksDB's internal statistics.
> > > > > >>
> > > > > >> Please have a look at it and let me know what you think. Since I
> > am
> > > > not a
> > > > > >> RocksDB expert, I am thankful for any additional pair of eyes
> that
> > > > > >> evaluates this KIP.
> > > > > >>
> > > > > >>
> > > > > >>
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams
> > > > > >>
> > > > > >> Best regards,
> > > > > >> Bruno
> > > > > >>
> > > > > >
> > > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
> *github:  <http://goog_969573159/>github.com/dongjinleekr
> <https://github.com/dongjinleekr>linkedin: kr.linkedin.com/in/dongjinleekr
> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck:
> speakerdeck.com/dongjin
> <https://speakerdeck.com/dongjin>*
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-471: Expose RocksDB Metrics in Kafka Streams

Reply via email to