Hello everyone, I believe now this KIP is ready for a vote. I will be starting the vote here momentarily.
Thanks again, Brenden On Wed, Jul 24, 2024 at 11:01 AM Brenden Deluna <bdel...@confluent.io> wrote: > Hi Mickael, > Thank you for your feedback. > > 1. I can see that, I will get that changed. > 2. Done > > Thanks, > Brenden > > On Wed, Jul 24, 2024 at 5:06 AM Mickael Maison <mickael.mai...@gmail.com> > wrote: > >> Hi, >> >> 1. The title is a bit misleading. It's proposing to add new metrics, >> JMX is just one of the mechanisms to export them. >> 2. +1 to not register the new metrics when using the classic consumer, >> instead of setting them to 0. Similarly I assume existing metrics that >> don't apply to the new consumers are not registered? >> 3. At the moment this KIP is not changing any public APIs. What's the >> plan to make AsyncKafkaConsumer public? >> >> Thanks, >> Mickael >> >> >> >> On Tue, Jul 23, 2024 at 6:03 PM Bruno Cadonna <cado...@apache.org> wrote: >> > >> > Hi Brenden, >> > >> > BC1. In his first e-mail Andrew wrote "I would expect that the metrics >> > do not exist at all". I agree with him. I think it would be better to >> > not add those metrics at all if the CLASSIC protocol is used rather than >> > the metrics exist and are all constant 0. This should be possible by not >> > adding the metrics to the metrics registry if the CONSUMER protocol is >> > not used. >> > >> > BC2. Is there a specific reason you do not propose >> > background-event-queue-time-max and background-event-queue-time-avg? If >> > folk think those are not useful we do not need to add them. However, if >> > those are not useful, why is background-event-queue-size useful. I was >> > just wondering about the asymmetry between background-event-queue and >> > application-event-queue. >> > >> > Best, >> > Bruno >> > >> > >> > >> > On 7/19/24 9:14 PM, Brenden Deluna wrote: >> > > Hi Apoorv, >> > > Thank you for your comments, I will address each. >> > > >> > > AM1. I can see the usefulness in also having an >> > > 'application-event-queue-age-max' to get an idea of outliers and how >> they >> > > may be affecting the average metric. I will add that. >> > > >> > > AM2. I agree with you there, I think 'time' is a better descriptor >> here >> > > than 'age'. I will update those metric names as well. >> > > >> > > AM3. Similar to above comments, I will change the name of that metric >> to be >> > > more consistent. And I think a max metric would also be useful here, >> adding >> > > that. >> > > >> > > AM4. Yes, good catch there. Will update that as well. >> > > >> > > Thank you, >> > > Brenden >> > > >> > > On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal < >> apoorvmitta...@gmail.com> >> > > wrote: >> > > >> > >> Hi Brendan, >> > >> Thanks for the KIP. The metrics are always helpful. >> > >> >> > >> AM1: Is `application-event-queue-age-avg` enough or do we require ` >> > >> application-event-queue-age-max` as well to differentiate with >> outliers? >> > >> >> > >> AM2: The kafka producer defines metric `record-queue-time-avg` which >> > >> captures the time spent in the buffer. Do you think we should have a >> > >> similar name for `application-event-queue-age-avg` i.e. change to ` >> > >> application-event-queue-time-avg`? Moreover other than similar >> naming, >> > >> `time` anyways seems more suitable than `age`, though minor. The >> `time` >> > >> usage is also aligned with the description of this metric. >> > >> >> > >> AM3: Metric `application-event-processing-time` says "the average >> time, >> > >> that the consumer network.....". Shall we have the `-avg` suffix in >> the >> > >> metric as we have defined for other metrics? Also do we require the >> max >> > >> metric as well for the same? >> > >> >> > >> AM4: Is the telemetry name for `unsent-requests-queue-size` intended >> > >> as `org.apache.kafka.consumer.unsent.requests.size`, >> > >> or it should be corrected to ` >> > >> org.apache.kafka.consumer.unsent.requests.queue.size`? >> > >> >> > >> AM2: >> > >> Regards, >> > >> Apoorv Mittal >> > >> +44 7721681581 >> > >> >> > >> >> > >> On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield < >> > >> andrew_schofi...@live.com> >> > >> wrote: >> > >> >> > >>> Hi Brenden, >> > >>> Thanks for the updates. >> > >>> >> > >>> AS4. I see that you’ve added `.ms` to a bunch of the metrics >> reflecting >> > >> the >> > >>> fact that they’re measured in milliseconds. However, I observe that >> most >> > >>> metrics >> > >>> in Kafka that are measured in milliseconds, with some exceptions in >> Kafka >> > >>> Connect >> > >>> and MirrorMaker do not follow this convention. I would tend to err >> on the >> > >>> side of >> > >>> consistency with the existing metrics and not use `.ms`. However, >> that’s >> > >>> just my >> > >>> opinion, so I’d be interested to know what other reviewers of the >> KIP >> > >>> think. >> > >>> >> > >>> Thanks, >> > >>> Andrew >> > >>> >> > >>>> On 12 Jul 2024, at 20:11, Brenden Deluna >> <bdel...@confluent.io.INVALID >> > >>> >> > >>> wrote: >> > >>>> >> > >>>> Hey Lianet, >> > >>>> >> > >>>> Thank you for your suggestions and feedback! >> > >>>> >> > >>>> >> > >>>> LM1. This has now been addressed. >> > >>>> >> > >>>> >> > >>>> LM2. I think that would be a valuable addition to the current set >> of >> > >>>> metrics, I will get that added. >> > >>>> >> > >>>> >> > >>>> LM3. Again great idea, that would certainly be helpful. Will add >> that >> > >> as >> > >>>> well. >> > >>>> >> > >>>> >> > >>>> Let me know if you have any more suggestions! >> > >>>> >> > >>>> >> > >>>> Thanks, >> > >>>> >> > >>>> Brenden >> > >>>> >> > >>>> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna < >> bdel...@confluent.io> >> > >>> wrote: >> > >>>> >> > >>>>> Hi Lucas, >> > >>>>> >> > >>>>> Thank you for the feedback! I have addressed your comments: >> > >>>>> >> > >>>>> >> > >>>>> LB1. Good catch there, I will update the names as needed. >> > >>>>> >> > >>>>> >> > >>>>> LB2. Good catch again! I will update the name to be more >> consistent. >> > >>>>> >> > >>>>> >> > >>>>> LB3. Thank you for pointing this out, I realized that all metric >> > >> values >> > >>>>> will actually be set to 0. I will specifiy this and explain why >> they >> > >>> will >> > >>>>> be 0. >> > >>>>> >> > >>>>> >> > >>>>> Nit: This metric is referring to the queue of unsent requests in >> the >> > >>>>> NetworkClientDelegate. For the metric descriptions I am trying to >> not >> > >>>>> include too much of the implementation details, hence the reason >> that >> > >>>>> description is quite short. I cannot think of other ways to >> describe >> > >> the >> > >>>>> metric without going deeper into the implementation, but please >> do let >> > >>> me >> > >>>>> know if you have any ideas. >> > >>>>> >> > >>>>> >> > >>>>> Thank you, >> > >>>>> >> > >>>>> Brenden >> > >>>>> >> > >>>>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com> >> wrote: >> > >>>>> >> > >>>>>> Hey Brenden, thanks for the KIP! Great to get more visibility >> into >> > >> the >> > >>> new >> > >>>>>> consumer. >> > >>>>>> >> > >>>>>> LM1. +1 on Lucas's suggestion for including the unit in the name, >> > >> seems >> > >>>>>> clearer and consistent (I do see several time metrics including >> ms) >> > >>>>>> >> > >>>>>> LM2. What about a new metric for >> application-event-queue-time-ms. It >> > >>> would >> > >>>>>> be a complement to the application-event-queue-size you're >> proposing, >> > >>> and >> > >>>>>> it will tell us how long the events sit in the queue, waiting to >> be >> > >>>>>> processed (from the time the API call adds the event to the >> queue, to >> > >>> the >> > >>>>>> time it's processed in the background thread). I find it would be >> > >> very >> > >>>>>> interesting. >> > >>>>>> >> > >>>>>> LM3. Thinking about the actual usage of >> > >>>>>> "time-between-network-thread-poll-xxx" metric, I imagine it >> would be >> > >>>>>> helpful to know more about what could be impacting it. As I see >> it, >> > >> the >> > >>>>>> network thread cadence could be mainly impacted by: 1- app event >> > >>>>>> processing >> > >>>>>> (generate requests), 2- network client poll (actual >> send/receive). >> > >> For >> > >>> 2, >> > >>>>>> the new consumer reuses the same component as the legacy one, >> but 1 >> > >> is >> > >>>>>> specific to the new consumer, so what about a metric >> > >>>>>> for application-event-processing-time-ms (we could consider avg I >> > >> would >> > >>>>>> say). It would be the time that the network thread takes to >> process >> > >> all >> > >>>>>> available events on each run. >> > >>>>>> >> > >>>>>> Cheers! >> > >>>>>> Lianet >> > >>>>>> >> > >>>>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy >> > >>>>>> <lbruts...@confluent.io.invalid> wrote: >> > >>>>>> >> > >>>>>>> Hey Brenden, >> > >>>>>>> >> > >>>>>>> thanks for the KIP! These will be great to observe and debug the >> > >>>>>>> background thread of the new consumer. >> > >>>>>>> >> > >>>>>>> LB1. `time-between-network-thread-poll-max` → I see several >> similar >> > >>>>>>> metrics including the unit in the metric name (ms or us). We >> could >> > >>>>>>> consider this, although it's probably not strictly required. >> > >> However, >> > >>>>>>> at least the description should state the unit. Same for the >> `avg` >> > >>>>>>> version. >> > >>>>>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's >> > >> referring >> > >>>>>>> to the size of the request. How about >> `unsent-request-queue-size` or >> > >>>>>>> `unsent-request-count` or simply `unsent-requests`? >> > >>>>>>> LB3. "the proposed metrics below will be set to null or 0." → >> which >> > >>>>>>> one will be set to null and which ones will be set to 0, and >> why? >> > >>>>>>> >> > >>>>>>> nit: "The current number of unsent requests in the consumer >> > >> network" → >> > >>>>>>> Seems to be missing something? >> > >>>>>>> >> > >>>>>>> Cheers, >> > >>>>>>> Lucas >> > >>>>>>> >> > >>>>>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna >> > >>>>>>> <bdel...@confluent.io.invalid> wrote: >> > >>>>>>>> >> > >>>>>>>> Hi Andrew, >> > >>>>>>>> Thank you for the feedback and your question. >> > >>>>>>>> >> > >>>>>>>> AS1. Great idea, I will get that added. >> > >>>>>>>> >> > >>>>>>>> AS2. For unsent-events-age-max, age will be calculated once the >> > >> event >> > >>>>>> is >> > >>>>>>>> sent, so you are correct. >> > >>>>>>>> >> > >>>>>>>> AS3. I agree, I think that would be a helpful metric to add, >> thank >> > >>>>>> you! I >> > >>>>>>>> will get that added. >> > >>>>>>>> >> > >>>>>>>> Please let me know if you have any additional feedback, >> > >> suggestions, >> > >>>>>> or >> > >>>>>>>> questions. >> > >>>>>>>> >> > >>>>>>>> Thanks, >> > >>>>>>>> Brenden >> > >>>>>>>> >> > >>>>>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield < >> > >>>>>>> andrew_schofi...@live.com> >> > >>>>>>>> wrote: >> > >>>>>>>> >> > >>>>>>>>> Hi Brenden, >> > >>>>>>>>> Thanks for the KIP. It fills a gap in the metrics for the new >> > >>>>>> consumer >> > >>>>>>>>> nicely. >> > >>>>>>>>> >> > >>>>>>>>> AS1. If using the CLASSIC protocol, and thus the >> > >>>>>> LegacyKafkaConsumer, >> > >>>>>>>>> I would expect that the metrics do not exist at all. Maybe say >> > >>>>>>> something >> > >>>>>>>>> like >> > >>>>>>>>> “These metrics are for the new consumer implementation using >> the >> > >>>>>>>>> CONSUMER protocol”. >> > >>>>>>>>> >> > >>>>>>>>> AS2. For unsent-events-age-max, when is the age calculated? >> For >> > >>>>>>> example, >> > >>>>>>>>> is it calculated at the time that the unsent event is removed >> from >> > >>>>>> the >> > >>>>>>>>> list and sent, or does the metric reflect unsent events which >> are >> > >>>>>> still >> > >>>>>>>>> enqueued? I suspect the former, but thought I’d check. >> > >>>>>>>>> >> > >>>>>>>>> AS3. I think that unsent-events-age-avg would also be >> interesting >> > >> to >> > >>>>>>>>> get an idea of how long unsent events tend to sit around >> before >> > >>>>>>> sending. >> > >>>>>>>>> Of course, the question from AS2 would also apply to the >> average. >> > >>>>>>>>> >> > >>>>>>>>> Thanks, >> > >>>>>>>>> Andrew >> > >>>>>>>>> >> > >>>>>>>>>> On 10 Jul 2024, at 17:44, Philip Nee >> <p...@confluent.io.INVALID> >> > >>>>>>> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>> Hi all, >> > >>>>>>>>>> >> > >>>>>>>>>> This is the link to the KIP document. >> > >>>>>>>>>> >> > >>>>>>>>> >> > >>>>>>> >> > >>>>>> >> > >>> >> > >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer >> > >>>>>>>>>> >> > >>>>>>>>>> Any comment is appreciated, >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna >> > >>>>>>>>> <bdel...@confluent.io.invalid> >> > >>>>>>>>>> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>>> Hello everyone, >> > >>>>>>>>>>> >> > >>>>>>>>>>> I would like to start the discussion thread for KIP-1068. >> This >> > >>>>>> is a >> > >>>>>>>>>>> relatively small KIP, only proposing to add a couple of new >> > >>>>>> metrics. >> > >>>>>>>>>>> >> > >>>>>>>>>>> If you have any suggestions or feedback, let me know, it >> will be >> > >>>>>>> much >> > >>>>>>>>>>> appreciated. >> > >>> >> > >>> >> > >>> >> > >> >> > > >> >