I definitely agree with Guozhang's "meta" comment: if it's possible to allow users to pick and choose individual RocksDB metrics that would be ideal. One further question is whether these will be debug or info level metrics, or a separate level altogether? If there is a nontrivial overhead associated with attaching RocksDB metrics it would probably be good to be able to independently turn on/off Rocks metrics
On Tue, May 21, 2019 at 9:00 AM Guozhang Wang <wangg...@gmail.com> wrote: > Hello Bruno, > > Thanks for the KIP, I have a few minor comments and a meta one which are > relatively aligned with other folks: > > Minor: > > 1) Regarding the "rocksdb-state-id = [store ID]", to be consistent with > other state store metrics (see > https://cwiki.apache.org/confluence/display/KAFKA/KIP-444%3A+Augment+metrics+for+Kafka+Streams), > this tag should be either "rocksdb-window-state-id", > "rocksdb-session-state-id" or "rocksdb-state-id". For window / session > store backed by rocksDB, the tags should not be "rocksdb-state-id". > > 2) Also for window / session store, my take is that you plan to have > multiple rocksDB behind the scene to report to the same set of metrics, is > that right? My concern is that for such types of state stores, most of the > access would be on the first segment rocksDB instance, and hence coalescing > all of instances as a single set of metrics may dilute it. > > 3) I agree with @sop...@confluent.io <sop...@confluent.io> that we should > better have some documentation educating users what to do when see what > anomalies in metrics; though I think this is a long endeavoring effort that > may go beyond this KIP's scope, let's keep that as a separate umbrella JIRA > to accumulate knowledge over time. > > > Meta: > > 4) Instead of trying to enumerate all the ones that might be helpful, I'd > recommend that we expose a user-friendly API in StreamsMetrics to allow > users to add more metrics from RocksDB they'd like to have, while only > keeping a small set of most-meaningful ones that are ubiquitously useful > out-of-the-box. WDYT? > > > Guozhang > > > > On Tue, May 21, 2019 at 8:04 AM Dongjin Lee <dong...@apache.org> wrote: > >> Hi Bruno, >> >> I just read the KIP. I think this feature is great. As far as I know, most >> Kafka users monitor the host resources, JVM resources, and Kafka metrics >> only, not RocksDB for configuring the statistics feature is a little bit >> tiresome. Since RocksDB impacts the performance of Kafka Streams, I >> believe >> providing these metrics with other metrics in one place is much better. >> >> However, I am a little bit not assured for how much information should be >> provided to the users with the documentation - how the user can control >> the >> RocksDB may on the boundary of the scope. How do you think? >> >> +1. I entirely agree to Bill's comments; I also think `rocksdb-store-id` >> is >> better than `rocksdb-state-id` and metrics on total compactions and an >> average number of compactions is also needed. >> >> Regards, >> Dongjin >> >> On Tue, May 21, 2019 at 2:48 AM John Roesler <j...@confluent.io> wrote: >> >> > Hi Bruno, >> > >> > Looks really good overall. This is going to be an awesome addition. >> > >> > My only thought was that we have "bytes-flushed-(rate|total) and >> > flush-time-(avg|min|max)" metrics, and the description states that >> > these are specifically for Memtable flush operations. What do you >> > think about calling it "memtable-bytes-flushed... (etc)"? On one hand, >> > I could see this being redundant, since that's the only thing that >> > gets "flushed" inside of Rocks. But on the other, we have an >> > independent "flush" operation in streams, which might be confusing. >> > Plus, it might help people who are looking at the full "menu" of >> > hundreds of metrics. They can't read and remember every description >> > while trying to understand the full list of metrics, so going for >> > maximum self-documentation in the name seems nice. >> > >> > But that's a minor thought. Modulo the others' comments, this looks good >> > to me. >> > >> > Thanks, >> > -John >> > >> > On Mon, May 20, 2019 at 11:07 AM Bill Bejeck <bbej...@gmail.com> wrote: >> > > >> > > Hi Bruno, >> > > >> > > Thanks for the KIP, this will be a useful addition. >> > > >> > > Overall the KIP looks good to me, and I have two minor comments. >> > > >> > > 1. For the tags should, I'm wondering if rocksdb-state-id should be >> > > rocksdb-store-id >> > > instead? >> > > >> > > 2. With the compaction metrics, would it be possible to add total >> > > compactions and an average number of compactions? I've taken a look >> at >> > the >> > > available RocksDB metrics, and I'm not sure. But users can control >> how >> > > many L0 files it takes to trigger compaction so if it is possible; it >> may >> > > be useful. >> > > >> > > Thanks, >> > > Bill >> > > >> > > >> > > On Mon, May 20, 2019 at 9:15 AM Bruno Cadonna <br...@confluent.io> >> > wrote: >> > > >> > > > Hi Sophie, >> > > > >> > > > Thank you for your comments. >> > > > >> > > > It's a good idea to supplement the metrics with configuration option >> > > > to change the metrics. I also had some thoughts about it. However, I >> > > > think I need some experimentation to get this right. >> > > > >> > > > I added the block cache hit rates for index and filter blocks to the >> > > > KIP. As far as I understood, they should stay at zero, if users do >> not >> > > > configure RocksDB to include index and filter blocks into the block >> > > > cache. Did you also understand this similarly? I guess also in this >> > > > case some experimentation would be good to be sure. >> > > > >> > > > Best, >> > > > Bruno >> > > > >> > > > >> > > > On Sat, May 18, 2019 at 2:29 AM Sophie Blee-Goldman < >> > sop...@confluent.io> >> > > > wrote: >> > > > > >> > > > > Actually I wonder if it might be useful to users to be able to >> break >> > up >> > > > the >> > > > > cache hit stats by type? Some people may choose to store index and >> > filter >> > > > > blocks alongside data blocks, and it would probably be very >> helpful >> > for >> > > > > them to know who is making more effective use of the cache in >> order >> > to >> > > > tune >> > > > > how much of it is allocated to each. I'm not sure how common this >> > really >> > > > is >> > > > > but I think it would be invaluable to those who do. RocksDB >> > performance >> > > > can >> > > > > be quite opaque.. >> > > > > >> > > > > Cheers, >> > > > > Sophie >> > > > > >> > > > > On Fri, May 17, 2019 at 5:01 PM Sophie Blee-Goldman < >> > sop...@confluent.io >> > > > > >> > > > > wrote: >> > > > > >> > > > > > Hey Bruno! >> > > > > > >> > > > > > This all looks pretty good to me, but one suggestion I have is >> to >> > > > > > supplement each of the metrics with some info on how the user >> can >> > > > control >> > > > > > them. In other words, which options could/should they set in >> > > > > > RocksDBConfigSetter should they discover a particular >> bottleneck? >> > > > > > >> > > > > > I don't think this necessarily needs to go into the KIP, but I >> do >> > > > think it >> > > > > > should be included in the docs somewhere (happy to help build up >> > the >> > > > list >> > > > > > of associated options when the time comes) >> > > > > > >> > > > > > Thanks! >> > > > > > Sophie >> > > > > > >> > > > > > On Fri, May 17, 2019 at 2:54 PM Bruno Cadonna < >> br...@confluent.io> >> > > > wrote: >> > > > > > >> > > > > >> Hi all, >> > > > > >> >> > > > > >> this KIP describes the extension of the Kafka Streams' metrics >> to >> > > > include >> > > > > >> RocksDB's internal statistics. >> > > > > >> >> > > > > >> Please have a look at it and let me know what you think. Since >> I >> > am >> > > > not a >> > > > > >> RocksDB expert, I am thankful for any additional pair of eyes >> that >> > > > > >> evaluates this KIP. >> > > > > >> >> > > > > >> >> > > > > >> >> > > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-471:+Expose+RocksDB+Metrics+in+Kafka+Streams >> > > > > >> >> > > > > >> Best regards, >> > > > > >> Bruno >> > > > > >> >> > > > > > >> > > > >> > >> >> >> -- >> *Dongjin Lee* >> >> *A hitchhiker in the mathematical world.* >> *github: <http://goog_969573159/>github.com/dongjinleekr >> <https://github.com/dongjinleekr>linkedin: >> kr.linkedin.com/in/dongjinleekr >> <https://kr.linkedin.com/in/dongjinleekr>speakerdeck: >> speakerdeck.com/dongjin >> <https://speakerdeck.com/dongjin>* >> > > > -- > -- Guozhang >