Hi Apoorv, Thank you for your comments, I will address each. AM1. I can see the usefulness in also having an 'application-event-queue-age-max' to get an idea of outliers and how they may be affecting the average metric. I will add that.
AM2. I agree with you there, I think 'time' is a better descriptor here than 'age'. I will update those metric names as well. AM3. Similar to above comments, I will change the name of that metric to be more consistent. And I think a max metric would also be useful here, adding that. AM4. Yes, good catch there. Will update that as well. Thank you, Brenden On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <apoorvmitta...@gmail.com> wrote: > Hi Brendan, > Thanks for the KIP. The metrics are always helpful. > > AM1: Is `application-event-queue-age-avg` enough or do we require ` > application-event-queue-age-max` as well to differentiate with outliers? > > AM2: The kafka producer defines metric `record-queue-time-avg` which > captures the time spent in the buffer. Do you think we should have a > similar name for `application-event-queue-age-avg` i.e. change to ` > application-event-queue-time-avg`? Moreover other than similar naming, > `time` anyways seems more suitable than `age`, though minor. The `time` > usage is also aligned with the description of this metric. > > AM3: Metric `application-event-processing-time` says "the average time, > that the consumer network.....". Shall we have the `-avg` suffix in the > metric as we have defined for other metrics? Also do we require the max > metric as well for the same? > > AM4: Is the telemetry name for `unsent-requests-queue-size` intended > as `org.apache.kafka.consumer.unsent.requests.size`, > or it should be corrected to ` > org.apache.kafka.consumer.unsent.requests.queue.size`? > > AM2: > Regards, > Apoorv Mittal > +44 7721681581 > > > On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield < > andrew_schofi...@live.com> > wrote: > > > Hi Brenden, > > Thanks for the updates. > > > > AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting > the > > fact that they’re measured in milliseconds. However, I observe that most > > metrics > > in Kafka that are measured in milliseconds, with some exceptions in Kafka > > Connect > > and MirrorMaker do not follow this convention. I would tend to err on the > > side of > > consistency with the existing metrics and not use `.ms`. However, that’s > > just my > > opinion, so I’d be interested to know what other reviewers of the KIP > > think. > > > > Thanks, > > Andrew > > > > > On 12 Jul 2024, at 20:11, Brenden Deluna <bdel...@confluent.io.INVALID > > > > wrote: > > > > > > Hey Lianet, > > > > > > Thank you for your suggestions and feedback! > > > > > > > > > LM1. This has now been addressed. > > > > > > > > > LM2. I think that would be a valuable addition to the current set of > > > metrics, I will get that added. > > > > > > > > > LM3. Again great idea, that would certainly be helpful. Will add that > as > > > well. > > > > > > > > > Let me know if you have any more suggestions! > > > > > > > > > Thanks, > > > > > > Brenden > > > > > > On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <bdel...@confluent.io> > > wrote: > > > > > >> Hi Lucas, > > >> > > >> Thank you for the feedback! I have addressed your comments: > > >> > > >> > > >> LB1. Good catch there, I will update the names as needed. > > >> > > >> > > >> LB2. Good catch again! I will update the name to be more consistent. > > >> > > >> > > >> LB3. Thank you for pointing this out, I realized that all metric > values > > >> will actually be set to 0. I will specifiy this and explain why they > > will > > >> be 0. > > >> > > >> > > >> Nit: This metric is referring to the queue of unsent requests in the > > >> NetworkClientDelegate. For the metric descriptions I am trying to not > > >> include too much of the implementation details, hence the reason that > > >> description is quite short. I cannot think of other ways to describe > the > > >> metric without going deeper into the implementation, but please do let > > me > > >> know if you have any ideas. > > >> > > >> > > >> Thank you, > > >> > > >> Brenden > > >> > > >> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com> wrote: > > >> > > >>> Hey Brenden, thanks for the KIP! Great to get more visibility into > the > > new > > >>> consumer. > > >>> > > >>> LM1. +1 on Lucas's suggestion for including the unit in the name, > seems > > >>> clearer and consistent (I do see several time metrics including ms) > > >>> > > >>> LM2. What about a new metric for application-event-queue-time-ms. It > > would > > >>> be a complement to the application-event-queue-size you're proposing, > > and > > >>> it will tell us how long the events sit in the queue, waiting to be > > >>> processed (from the time the API call adds the event to the queue, to > > the > > >>> time it's processed in the background thread). I find it would be > very > > >>> interesting. > > >>> > > >>> LM3. Thinking about the actual usage of > > >>> "time-between-network-thread-poll-xxx" metric, I imagine it would be > > >>> helpful to know more about what could be impacting it. As I see it, > the > > >>> network thread cadence could be mainly impacted by: 1- app event > > >>> processing > > >>> (generate requests), 2- network client poll (actual send/receive). > For > > 2, > > >>> the new consumer reuses the same component as the legacy one, but 1 > is > > >>> specific to the new consumer, so what about a metric > > >>> for application-event-processing-time-ms (we could consider avg I > would > > >>> say). It would be the time that the network thread takes to process > all > > >>> available events on each run. > > >>> > > >>> Cheers! > > >>> Lianet > > >>> > > >>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy > > >>> <lbruts...@confluent.io.invalid> wrote: > > >>> > > >>>> Hey Brenden, > > >>>> > > >>>> thanks for the KIP! These will be great to observe and debug the > > >>>> background thread of the new consumer. > > >>>> > > >>>> LB1. `time-between-network-thread-poll-max` → I see several similar > > >>>> metrics including the unit in the metric name (ms or us). We could > > >>>> consider this, although it's probably not strictly required. > However, > > >>>> at least the description should state the unit. Same for the `avg` > > >>>> version. > > >>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's > referring > > >>>> to the size of the request. How about `unsent-request-queue-size` or > > >>>> `unsent-request-count` or simply `unsent-requests`? > > >>>> LB3. "the proposed metrics below will be set to null or 0." → which > > >>>> one will be set to null and which ones will be set to 0, and why? > > >>>> > > >>>> nit: "The current number of unsent requests in the consumer > network" → > > >>>> Seems to be missing something? > > >>>> > > >>>> Cheers, > > >>>> Lucas > > >>>> > > >>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna > > >>>> <bdel...@confluent.io.invalid> wrote: > > >>>>> > > >>>>> Hi Andrew, > > >>>>> Thank you for the feedback and your question. > > >>>>> > > >>>>> AS1. Great idea, I will get that added. > > >>>>> > > >>>>> AS2. For unsent-events-age-max, age will be calculated once the > event > > >>> is > > >>>>> sent, so you are correct. > > >>>>> > > >>>>> AS3. I agree, I think that would be a helpful metric to add, thank > > >>> you! I > > >>>>> will get that added. > > >>>>> > > >>>>> Please let me know if you have any additional feedback, > suggestions, > > >>> or > > >>>>> questions. > > >>>>> > > >>>>> Thanks, > > >>>>> Brenden > > >>>>> > > >>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield < > > >>>> andrew_schofi...@live.com> > > >>>>> wrote: > > >>>>> > > >>>>>> Hi Brenden, > > >>>>>> Thanks for the KIP. It fills a gap in the metrics for the new > > >>> consumer > > >>>>>> nicely. > > >>>>>> > > >>>>>> AS1. If using the CLASSIC protocol, and thus the > > >>> LegacyKafkaConsumer, > > >>>>>> I would expect that the metrics do not exist at all. Maybe say > > >>>> something > > >>>>>> like > > >>>>>> “These metrics are for the new consumer implementation using the > > >>>>>> CONSUMER protocol”. > > >>>>>> > > >>>>>> AS2. For unsent-events-age-max, when is the age calculated? For > > >>>> example, > > >>>>>> is it calculated at the time that the unsent event is removed from > > >>> the > > >>>>>> list and sent, or does the metric reflect unsent events which are > > >>> still > > >>>>>> enqueued? I suspect the former, but thought I’d check. > > >>>>>> > > >>>>>> AS3. I think that unsent-events-age-avg would also be interesting > to > > >>>>>> get an idea of how long unsent events tend to sit around before > > >>>> sending. > > >>>>>> Of course, the question from AS2 would also apply to the average. > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Andrew > > >>>>>> > > >>>>>>> On 10 Jul 2024, at 17:44, Philip Nee <p...@confluent.io.INVALID> > > >>>> wrote: > > >>>>>>> > > >>>>>>> Hi all, > > >>>>>>> > > >>>>>>> This is the link to the KIP document. > > >>>>>>> > > >>>>>> > > >>>> > > >>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer > > >>>>>>> > > >>>>>>> Any comment is appreciated, > > >>>>>>> > > >>>>>>> > > >>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna > > >>>>>> <bdel...@confluent.io.invalid> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hello everyone, > > >>>>>>>> > > >>>>>>>> I would like to start the discussion thread for KIP-1068. This > > >>> is a > > >>>>>>>> relatively small KIP, only proposing to add a couple of new > > >>> metrics. > > >>>>>>>> > > >>>>>>>> If you have any suggestions or feedback, let me know, it will be > > >>>> much > > >>>>>>>> appreciated. > > > > > > >