Hi, 3. Sorry I misunderstood something and got confused as the KIP mentions AsyncKafkaConsumer as a new consumer implementation. I now see that it's not an alternative to KafkaConsumer but just the "backend" used with the new protocol. So yes any new metrics added are public API and I agree AsyncKafkaConsumer should not be public.
Thanks, Mickael On Wed, Jul 24, 2024 at 12:27 PM Bruno Cadonna <cado...@apache.org> wrote: > > Hi Mickael, > > 1. I agree that the title is misleading. It should be something like > "New metrics for the AsyncKafkaConsumer". Maybe it should even be > "Metrics for the new consumer". > > 3. I am not sure I understand this comment. Exposed metrics are public > as far as I understand. So adding new metrics requires a KIP. We had > KIPs in the past that only introduced and/or removed new metrics. For > example, > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1001%3A+Add+CurrentControllerId+Metric > I do not think there are any plans to make the AsyncKafkaConsumer class > public. It is the implementation of the KafkaConsumer when the CONSUMER > protocol is used. But I might be wrong. At least, I am against making it > public. > > Best, > Bruno > > On 7/24/24 12:05 PM, Mickael Maison wrote: > > Hi, > > > > 1. The title is a bit misleading. It's proposing to add new metrics, > > JMX is just one of the mechanisms to export them. > > 2. +1 to not register the new metrics when using the classic consumer, > > instead of setting them to 0. Similarly I assume existing metrics that > > don't apply to the new consumers are not registered? > > 3. At the moment this KIP is not changing any public APIs. What's the > > plan to make AsyncKafkaConsumer public? > > > > Thanks, > > Mickael > > > > > > > > On Tue, Jul 23, 2024 at 6:03 PM Bruno Cadonna <cado...@apache.org> wrote: > >> > >> Hi Brenden, > >> > >> BC1. In his first e-mail Andrew wrote "I would expect that the metrics > >> do not exist at all". I agree with him. I think it would be better to > >> not add those metrics at all if the CLASSIC protocol is used rather than > >> the metrics exist and are all constant 0. This should be possible by not > >> adding the metrics to the metrics registry if the CONSUMER protocol is > >> not used. > >> > >> BC2. Is there a specific reason you do not propose > >> background-event-queue-time-max and background-event-queue-time-avg? If > >> folk think those are not useful we do not need to add them. However, if > >> those are not useful, why is background-event-queue-size useful. I was > >> just wondering about the asymmetry between background-event-queue and > >> application-event-queue. > >> > >> Best, > >> Bruno > >> > >> > >> > >> On 7/19/24 9:14 PM, Brenden Deluna wrote: > >>> Hi Apoorv, > >>> Thank you for your comments, I will address each. > >>> > >>> AM1. I can see the usefulness in also having an > >>> 'application-event-queue-age-max' to get an idea of outliers and how they > >>> may be affecting the average metric. I will add that. > >>> > >>> AM2. I agree with you there, I think 'time' is a better descriptor here > >>> than 'age'. I will update those metric names as well. > >>> > >>> AM3. Similar to above comments, I will change the name of that metric to > >>> be > >>> more consistent. And I think a max metric would also be useful here, > >>> adding > >>> that. > >>> > >>> AM4. Yes, good catch there. Will update that as well. > >>> > >>> Thank you, > >>> Brenden > >>> > >>> On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <apoorvmitta...@gmail.com> > >>> wrote: > >>> > >>>> Hi Brendan, > >>>> Thanks for the KIP. The metrics are always helpful. > >>>> > >>>> AM1: Is `application-event-queue-age-avg` enough or do we require ` > >>>> application-event-queue-age-max` as well to differentiate with outliers? > >>>> > >>>> AM2: The kafka producer defines metric `record-queue-time-avg` which > >>>> captures the time spent in the buffer. Do you think we should have a > >>>> similar name for `application-event-queue-age-avg` i.e. change to ` > >>>> application-event-queue-time-avg`? Moreover other than similar naming, > >>>> `time` anyways seems more suitable than `age`, though minor. The `time` > >>>> usage is also aligned with the description of this metric. > >>>> > >>>> AM3: Metric `application-event-processing-time` says "the average time, > >>>> that the consumer network.....". Shall we have the `-avg` suffix in the > >>>> metric as we have defined for other metrics? Also do we require the max > >>>> metric as well for the same? > >>>> > >>>> AM4: Is the telemetry name for `unsent-requests-queue-size` intended > >>>> as `org.apache.kafka.consumer.unsent.requests.size`, > >>>> or it should be corrected to ` > >>>> org.apache.kafka.consumer.unsent.requests.queue.size`? > >>>> > >>>> AM2: > >>>> Regards, > >>>> Apoorv Mittal > >>>> +44 7721681581 > >>>> > >>>> > >>>> On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield < > >>>> andrew_schofi...@live.com> > >>>> wrote: > >>>> > >>>>> Hi Brenden, > >>>>> Thanks for the updates. > >>>>> > >>>>> AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting > >>>> the > >>>>> fact that they’re measured in milliseconds. However, I observe that most > >>>>> metrics > >>>>> in Kafka that are measured in milliseconds, with some exceptions in > >>>>> Kafka > >>>>> Connect > >>>>> and MirrorMaker do not follow this convention. I would tend to err on > >>>>> the > >>>>> side of > >>>>> consistency with the existing metrics and not use `.ms`. However, that’s > >>>>> just my > >>>>> opinion, so I’d be interested to know what other reviewers of the KIP > >>>>> think. > >>>>> > >>>>> Thanks, > >>>>> Andrew > >>>>> > >>>>>> On 12 Jul 2024, at 20:11, Brenden Deluna <bdel...@confluent.io.INVALID > >>>>> > >>>>> wrote: > >>>>>> > >>>>>> Hey Lianet, > >>>>>> > >>>>>> Thank you for your suggestions and feedback! > >>>>>> > >>>>>> > >>>>>> LM1. This has now been addressed. > >>>>>> > >>>>>> > >>>>>> LM2. I think that would be a valuable addition to the current set of > >>>>>> metrics, I will get that added. > >>>>>> > >>>>>> > >>>>>> LM3. Again great idea, that would certainly be helpful. Will add that > >>>> as > >>>>>> well. > >>>>>> > >>>>>> > >>>>>> Let me know if you have any more suggestions! > >>>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Brenden > >>>>>> > >>>>>> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <bdel...@confluent.io> > >>>>> wrote: > >>>>>> > >>>>>>> Hi Lucas, > >>>>>>> > >>>>>>> Thank you for the feedback! I have addressed your comments: > >>>>>>> > >>>>>>> > >>>>>>> LB1. Good catch there, I will update the names as needed. > >>>>>>> > >>>>>>> > >>>>>>> LB2. Good catch again! I will update the name to be more consistent. > >>>>>>> > >>>>>>> > >>>>>>> LB3. Thank you for pointing this out, I realized that all metric > >>>> values > >>>>>>> will actually be set to 0. I will specifiy this and explain why they > >>>>> will > >>>>>>> be 0. > >>>>>>> > >>>>>>> > >>>>>>> Nit: This metric is referring to the queue of unsent requests in the > >>>>>>> NetworkClientDelegate. For the metric descriptions I am trying to not > >>>>>>> include too much of the implementation details, hence the reason that > >>>>>>> description is quite short. I cannot think of other ways to describe > >>>> the > >>>>>>> metric without going deeper into the implementation, but please do let > >>>>> me > >>>>>>> know if you have any ideas. > >>>>>>> > >>>>>>> > >>>>>>> Thank you, > >>>>>>> > >>>>>>> Brenden > >>>>>>> > >>>>>>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com> wrote: > >>>>>>> > >>>>>>>> Hey Brenden, thanks for the KIP! Great to get more visibility into > >>>> the > >>>>> new > >>>>>>>> consumer. > >>>>>>>> > >>>>>>>> LM1. +1 on Lucas's suggestion for including the unit in the name, > >>>> seems > >>>>>>>> clearer and consistent (I do see several time metrics including ms) > >>>>>>>> > >>>>>>>> LM2. What about a new metric for application-event-queue-time-ms. It > >>>>> would > >>>>>>>> be a complement to the application-event-queue-size you're proposing, > >>>>> and > >>>>>>>> it will tell us how long the events sit in the queue, waiting to be > >>>>>>>> processed (from the time the API call adds the event to the queue, to > >>>>> the > >>>>>>>> time it's processed in the background thread). I find it would be > >>>> very > >>>>>>>> interesting. > >>>>>>>> > >>>>>>>> LM3. Thinking about the actual usage of > >>>>>>>> "time-between-network-thread-poll-xxx" metric, I imagine it would be > >>>>>>>> helpful to know more about what could be impacting it. As I see it, > >>>> the > >>>>>>>> network thread cadence could be mainly impacted by: 1- app event > >>>>>>>> processing > >>>>>>>> (generate requests), 2- network client poll (actual send/receive). > >>>> For > >>>>> 2, > >>>>>>>> the new consumer reuses the same component as the legacy one, but 1 > >>>> is > >>>>>>>> specific to the new consumer, so what about a metric > >>>>>>>> for application-event-processing-time-ms (we could consider avg I > >>>> would > >>>>>>>> say). It would be the time that the network thread takes to process > >>>> all > >>>>>>>> available events on each run. > >>>>>>>> > >>>>>>>> Cheers! > >>>>>>>> Lianet > >>>>>>>> > >>>>>>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy > >>>>>>>> <lbruts...@confluent.io.invalid> wrote: > >>>>>>>> > >>>>>>>>> Hey Brenden, > >>>>>>>>> > >>>>>>>>> thanks for the KIP! These will be great to observe and debug the > >>>>>>>>> background thread of the new consumer. > >>>>>>>>> > >>>>>>>>> LB1. `time-between-network-thread-poll-max` → I see several similar > >>>>>>>>> metrics including the unit in the metric name (ms or us). We could > >>>>>>>>> consider this, although it's probably not strictly required. > >>>> However, > >>>>>>>>> at least the description should state the unit. Same for the `avg` > >>>>>>>>> version. > >>>>>>>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's > >>>> referring > >>>>>>>>> to the size of the request. How about `unsent-request-queue-size` or > >>>>>>>>> `unsent-request-count` or simply `unsent-requests`? > >>>>>>>>> LB3. "the proposed metrics below will be set to null or 0." → which > >>>>>>>>> one will be set to null and which ones will be set to 0, and why? > >>>>>>>>> > >>>>>>>>> nit: "The current number of unsent requests in the consumer > >>>> network" → > >>>>>>>>> Seems to be missing something? > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Lucas > >>>>>>>>> > >>>>>>>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna > >>>>>>>>> <bdel...@confluent.io.invalid> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Andrew, > >>>>>>>>>> Thank you for the feedback and your question. > >>>>>>>>>> > >>>>>>>>>> AS1. Great idea, I will get that added. > >>>>>>>>>> > >>>>>>>>>> AS2. For unsent-events-age-max, age will be calculated once the > >>>> event > >>>>>>>> is > >>>>>>>>>> sent, so you are correct. > >>>>>>>>>> > >>>>>>>>>> AS3. I agree, I think that would be a helpful metric to add, thank > >>>>>>>> you! I > >>>>>>>>>> will get that added. > >>>>>>>>>> > >>>>>>>>>> Please let me know if you have any additional feedback, > >>>> suggestions, > >>>>>>>> or > >>>>>>>>>> questions. > >>>>>>>>>> > >>>>>>>>>> Thanks, > >>>>>>>>>> Brenden > >>>>>>>>>> > >>>>>>>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield < > >>>>>>>>> andrew_schofi...@live.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi Brenden, > >>>>>>>>>>> Thanks for the KIP. It fills a gap in the metrics for the new > >>>>>>>> consumer > >>>>>>>>>>> nicely. > >>>>>>>>>>> > >>>>>>>>>>> AS1. If using the CLASSIC protocol, and thus the > >>>>>>>> LegacyKafkaConsumer, > >>>>>>>>>>> I would expect that the metrics do not exist at all. Maybe say > >>>>>>>>> something > >>>>>>>>>>> like > >>>>>>>>>>> “These metrics are for the new consumer implementation using the > >>>>>>>>>>> CONSUMER protocol”. > >>>>>>>>>>> > >>>>>>>>>>> AS2. For unsent-events-age-max, when is the age calculated? For > >>>>>>>>> example, > >>>>>>>>>>> is it calculated at the time that the unsent event is removed from > >>>>>>>> the > >>>>>>>>>>> list and sent, or does the metric reflect unsent events which are > >>>>>>>> still > >>>>>>>>>>> enqueued? I suspect the former, but thought I’d check. > >>>>>>>>>>> > >>>>>>>>>>> AS3. I think that unsent-events-age-avg would also be interesting > >>>> to > >>>>>>>>>>> get an idea of how long unsent events tend to sit around before > >>>>>>>>> sending. > >>>>>>>>>>> Of course, the question from AS2 would also apply to the average. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Andrew > >>>>>>>>>>> > >>>>>>>>>>>> On 10 Jul 2024, at 17:44, Philip Nee <p...@confluent.io.INVALID> > >>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> This is the link to the KIP document. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>> > >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer > >>>>>>>>>>>> > >>>>>>>>>>>> Any comment is appreciated, > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna > >>>>>>>>>>> <bdel...@confluent.io.invalid> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hello everyone, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I would like to start the discussion thread for KIP-1068. This > >>>>>>>> is a > >>>>>>>>>>>>> relatively small KIP, only proposing to add a couple of new > >>>>>>>> metrics. > >>>>>>>>>>>>> > >>>>>>>>>>>>> If you have any suggestions or feedback, let me know, it will be > >>>>>>>>> much > >>>>>>>>>>>>> appreciated. > >>>>> > >>>>> > >>>>> > >>>> > >>>