Hello everyone,

I believe now this KIP is ready for a vote. I will be starting the vote
here momentarily.

Thanks again,
Brenden

On Wed, Jul 24, 2024 at 11:01 AM Brenden Deluna <bdel...@confluent.io>
wrote:

> Hi Mickael,
> Thank you for your feedback.
>
> 1. I can see that, I will get that changed.
> 2. Done
>
> Thanks,
> Brenden
>
> On Wed, Jul 24, 2024 at 5:06 AM Mickael Maison <mickael.mai...@gmail.com>
> wrote:
>
>> Hi,
>>
>> 1. The title is a bit misleading. It's proposing to add new metrics,
>> JMX is just one of the mechanisms to export them.
>> 2. +1 to not register the new metrics when using the classic consumer,
>> instead of setting them to 0. Similarly I assume existing metrics that
>> don't apply to the new consumers are not registered?
>> 3. At the moment this KIP is not changing any public APIs. What's the
>> plan to make AsyncKafkaConsumer public?
>>
>> Thanks,
>> Mickael
>>
>>
>>
>> On Tue, Jul 23, 2024 at 6:03 PM Bruno Cadonna <cado...@apache.org> wrote:
>> >
>> > Hi Brenden,
>> >
>> > BC1. In his first e-mail Andrew wrote "I would expect that the metrics
>> > do not exist at all". I agree with him. I think it would be better to
>> > not add those metrics at all if the CLASSIC protocol is used rather than
>> > the metrics exist and are all constant 0. This should be possible by not
>> > adding the metrics to the metrics registry if the CONSUMER protocol is
>> > not used.
>> >
>> > BC2. Is there a specific reason you do not propose
>> > background-event-queue-time-max and background-event-queue-time-avg? If
>> > folk think those are not useful we do not need to add them. However, if
>> > those are not useful, why is background-event-queue-size useful. I was
>> > just wondering about the asymmetry between background-event-queue and
>> > application-event-queue.
>> >
>> > Best,
>> > Bruno
>> >
>> >
>> >
>> > On 7/19/24 9:14 PM, Brenden Deluna wrote:
>> > > Hi Apoorv,
>> > > Thank you for your comments, I will address each.
>> > >
>> > > AM1. I can see the usefulness in also having an
>> > > 'application-event-queue-age-max' to get an idea of outliers and how
>> they
>> > > may be affecting the average metric. I will add that.
>> > >
>> > > AM2. I agree with you there, I think 'time' is a better descriptor
>> here
>> > > than 'age'. I will update those metric names as well.
>> > >
>> > > AM3. Similar to above comments, I will change the name of that metric
>> to be
>> > > more consistent. And I think a max metric would also be useful here,
>> adding
>> > > that.
>> > >
>> > > AM4. Yes, good catch there. Will update that as well.
>> > >
>> > > Thank you,
>> > > Brenden
>> > >
>> > > On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <
>> apoorvmitta...@gmail.com>
>> > > wrote:
>> > >
>> > >> Hi Brendan,
>> > >> Thanks for the KIP. The metrics are always helpful.
>> > >>
>> > >> AM1: Is `application-event-queue-age-avg` enough or do we require `
>> > >> application-event-queue-age-max` as well to differentiate with
>> outliers?
>> > >>
>> > >> AM2: The kafka producer defines metric `record-queue-time-avg` which
>> > >> captures the time spent in the buffer. Do you think we should have a
>> > >> similar name for `application-event-queue-age-avg` i.e. change to `
>> > >> application-event-queue-time-avg`? Moreover other than similar
>> naming,
>> > >> `time` anyways seems more suitable than `age`, though minor. The
>> `time`
>> > >> usage is also aligned with the description of this metric.
>> > >>
>> > >> AM3: Metric `application-event-processing-time` says "the average
>> time,
>> > >> that the consumer network.....". Shall we have the `-avg` suffix in
>> the
>> > >> metric as we have defined for other metrics? Also do we require the
>> max
>> > >> metric as well for the same?
>> > >>
>> > >> AM4: Is the telemetry name for `unsent-requests-queue-size` intended
>> > >> as `org.apache.kafka.consumer.unsent.requests.size`,
>> > >> or it should be corrected to `
>> > >> org.apache.kafka.consumer.unsent.requests.queue.size`?
>> > >>
>> > >> AM2:
>> > >> Regards,
>> > >> Apoorv Mittal
>> > >> +44 7721681581
>> > >>
>> > >>
>> > >> On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield <
>> > >> andrew_schofi...@live.com>
>> > >> wrote:
>> > >>
>> > >>> Hi Brenden,
>> > >>> Thanks for the updates.
>> > >>>
>> > >>> AS4. I see that you’ve added `.ms` to a bunch of the metrics
>> reflecting
>> > >> the
>> > >>> fact that they’re measured in milliseconds. However, I observe that
>> most
>> > >>> metrics
>> > >>> in Kafka that are measured in milliseconds, with some exceptions in
>> Kafka
>> > >>> Connect
>> > >>> and MirrorMaker do not follow this convention. I would tend to err
>> on the
>> > >>> side of
>> > >>> consistency with the existing metrics and not use `.ms`. However,
>> that’s
>> > >>> just my
>> > >>> opinion, so I’d be interested to know what other reviewers of the
>> KIP
>> > >>> think.
>> > >>>
>> > >>> Thanks,
>> > >>> Andrew
>> > >>>
>> > >>>> On 12 Jul 2024, at 20:11, Brenden Deluna
>> <bdel...@confluent.io.INVALID
>> > >>>
>> > >>> wrote:
>> > >>>>
>> > >>>> Hey Lianet,
>> > >>>>
>> > >>>> Thank you for your suggestions and feedback!
>> > >>>>
>> > >>>>
>> > >>>> LM1. This has now been addressed.
>> > >>>>
>> > >>>>
>> > >>>> LM2. I think that would be a valuable addition to the current set
>> of
>> > >>>> metrics, I will get that added.
>> > >>>>
>> > >>>>
>> > >>>> LM3. Again great idea, that would certainly be helpful. Will add
>> that
>> > >> as
>> > >>>> well.
>> > >>>>
>> > >>>>
>> > >>>> Let me know if you have any more suggestions!
>> > >>>>
>> > >>>>
>> > >>>> Thanks,
>> > >>>>
>> > >>>> Brenden
>> > >>>>
>> > >>>> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <
>> bdel...@confluent.io>
>> > >>> wrote:
>> > >>>>
>> > >>>>> Hi Lucas,
>> > >>>>>
>> > >>>>> Thank you for the feedback! I have addressed your comments:
>> > >>>>>
>> > >>>>>
>> > >>>>> LB1. Good catch there, I will update the names as needed.
>> > >>>>>
>> > >>>>>
>> > >>>>> LB2. Good catch again! I will update the name to be more
>> consistent.
>> > >>>>>
>> > >>>>>
>> > >>>>> LB3. Thank you for pointing this out, I realized that all metric
>> > >> values
>> > >>>>> will actually be set to 0. I will specifiy this and explain why
>> they
>> > >>> will
>> > >>>>> be 0.
>> > >>>>>
>> > >>>>>
>> > >>>>> Nit: This metric is referring to the queue of unsent requests in
>> the
>> > >>>>> NetworkClientDelegate. For the metric descriptions I am trying to
>> not
>> > >>>>> include too much of the implementation details, hence the reason
>> that
>> > >>>>> description is quite short. I cannot think of other ways to
>> describe
>> > >> the
>> > >>>>> metric without going deeper into the implementation, but please
>> do let
>> > >>> me
>> > >>>>> know if you have any ideas.
>> > >>>>>
>> > >>>>>
>> > >>>>> Thank you,
>> > >>>>>
>> > >>>>> Brenden
>> > >>>>>
>> > >>>>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com>
>> wrote:
>> > >>>>>
>> > >>>>>> Hey Brenden, thanks for the KIP! Great to get more visibility
>> into
>> > >> the
>> > >>> new
>> > >>>>>> consumer.
>> > >>>>>>
>> > >>>>>> LM1. +1 on Lucas's suggestion for including the unit in the name,
>> > >> seems
>> > >>>>>> clearer and consistent (I do see several time metrics including
>> ms)
>> > >>>>>>
>> > >>>>>> LM2. What about a new metric for
>> application-event-queue-time-ms. It
>> > >>> would
>> > >>>>>> be a complement to the application-event-queue-size you're
>> proposing,
>> > >>> and
>> > >>>>>> it will tell us how long the events sit in the queue, waiting to
>> be
>> > >>>>>> processed (from the time the API call adds the event to the
>> queue, to
>> > >>> the
>> > >>>>>> time it's processed in the background thread). I find it would be
>> > >> very
>> > >>>>>> interesting.
>> > >>>>>>
>> > >>>>>> LM3. Thinking about the actual usage of
>> > >>>>>> "time-between-network-thread-poll-xxx" metric, I imagine it
>> would be
>> > >>>>>> helpful to know more about what could be impacting it. As I see
>> it,
>> > >> the
>> > >>>>>> network thread cadence could be mainly impacted by: 1- app event
>> > >>>>>> processing
>> > >>>>>> (generate requests), 2- network client poll (actual
>> send/receive).
>> > >> For
>> > >>> 2,
>> > >>>>>> the new consumer reuses the same component as the legacy one,
>> but 1
>> > >> is
>> > >>>>>> specific to the new consumer, so what about a metric
>> > >>>>>> for application-event-processing-time-ms (we could consider avg I
>> > >> would
>> > >>>>>> say). It would be the time that the network thread takes to
>> process
>> > >> all
>> > >>>>>> available events on each run.
>> > >>>>>>
>> > >>>>>> Cheers!
>> > >>>>>> Lianet
>> > >>>>>>
>> > >>>>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
>> > >>>>>> <lbruts...@confluent.io.invalid> wrote:
>> > >>>>>>
>> > >>>>>>> Hey Brenden,
>> > >>>>>>>
>> > >>>>>>> thanks for the KIP! These will be great to observe and debug the
>> > >>>>>>> background thread of the new consumer.
>> > >>>>>>>
>> > >>>>>>> LB1. `time-between-network-thread-poll-max` → I see several
>> similar
>> > >>>>>>> metrics including the unit in the metric name (ms or us). We
>> could
>> > >>>>>>> consider this, although it's probably not strictly required.
>> > >> However,
>> > >>>>>>> at least the description should state the unit. Same for the
>> `avg`
>> > >>>>>>> version.
>> > >>>>>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's
>> > >> referring
>> > >>>>>>> to the size of the request. How about
>> `unsent-request-queue-size` or
>> > >>>>>>> `unsent-request-count` or simply `unsent-requests`?
>> > >>>>>>> LB3. "the proposed metrics below will be set to null or 0." →
>> which
>> > >>>>>>> one will be set to null and which ones will be set to 0, and
>> why?
>> > >>>>>>>
>> > >>>>>>> nit: "The current number of unsent requests in the consumer
>> > >> network" →
>> > >>>>>>> Seems to be missing something?
>> > >>>>>>>
>> > >>>>>>> Cheers,
>> > >>>>>>> Lucas
>> > >>>>>>>
>> > >>>>>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
>> > >>>>>>> <bdel...@confluent.io.invalid> wrote:
>> > >>>>>>>>
>> > >>>>>>>> Hi Andrew,
>> > >>>>>>>> Thank you for the feedback and your question.
>> > >>>>>>>>
>> > >>>>>>>> AS1. Great idea, I will get that added.
>> > >>>>>>>>
>> > >>>>>>>> AS2. For unsent-events-age-max, age will be calculated once the
>> > >> event
>> > >>>>>> is
>> > >>>>>>>> sent, so you are correct.
>> > >>>>>>>>
>> > >>>>>>>> AS3. I agree, I think that would be a helpful metric to add,
>> thank
>> > >>>>>> you! I
>> > >>>>>>>> will get that added.
>> > >>>>>>>>
>> > >>>>>>>> Please let me know if you have any additional feedback,
>> > >> suggestions,
>> > >>>>>> or
>> > >>>>>>>> questions.
>> > >>>>>>>>
>> > >>>>>>>> Thanks,
>> > >>>>>>>> Brenden
>> > >>>>>>>>
>> > >>>>>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
>> > >>>>>>> andrew_schofi...@live.com>
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Hi Brenden,
>> > >>>>>>>>> Thanks for the KIP. It fills a gap in the metrics for the new
>> > >>>>>> consumer
>> > >>>>>>>>> nicely.
>> > >>>>>>>>>
>> > >>>>>>>>> AS1. If using the CLASSIC protocol, and thus the
>> > >>>>>> LegacyKafkaConsumer,
>> > >>>>>>>>> I would expect that the metrics do not exist at all. Maybe say
>> > >>>>>>> something
>> > >>>>>>>>> like
>> > >>>>>>>>> “These metrics are for the new consumer implementation using
>> the
>> > >>>>>>>>> CONSUMER protocol”.
>> > >>>>>>>>>
>> > >>>>>>>>> AS2. For unsent-events-age-max, when is the age calculated?
>> For
>> > >>>>>>> example,
>> > >>>>>>>>> is it calculated at the time that the unsent event is removed
>> from
>> > >>>>>> the
>> > >>>>>>>>> list and sent, or does the metric reflect unsent events which
>> are
>> > >>>>>> still
>> > >>>>>>>>> enqueued? I suspect the former, but thought I’d check.
>> > >>>>>>>>>
>> > >>>>>>>>> AS3. I think that unsent-events-age-avg would also be
>> interesting
>> > >> to
>> > >>>>>>>>> get an idea of how long unsent events tend to sit around
>> before
>> > >>>>>>> sending.
>> > >>>>>>>>> Of course, the question from AS2 would also apply to the
>> average.
>> > >>>>>>>>>
>> > >>>>>>>>> Thanks,
>> > >>>>>>>>> Andrew
>> > >>>>>>>>>
>> > >>>>>>>>>> On 10 Jul 2024, at 17:44, Philip Nee
>> <p...@confluent.io.INVALID>
>> > >>>>>>> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>> Hi all,
>> > >>>>>>>>>>
>> > >>>>>>>>>> This is the link to the KIP document.
>> > >>>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>>>
>> > >>>
>> > >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer
>> > >>>>>>>>>>
>> > >>>>>>>>>> Any comment is appreciated,
>> > >>>>>>>>>>
>> > >>>>>>>>>>
>> > >>>>>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna
>> > >>>>>>>>> <bdel...@confluent.io.invalid>
>> > >>>>>>>>>> wrote:
>> > >>>>>>>>>>
>> > >>>>>>>>>>> Hello everyone,
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> I would like to start the discussion thread for KIP-1068.
>> This
>> > >>>>>> is a
>> > >>>>>>>>>>> relatively small KIP, only proposing to add a couple of new
>> > >>>>>> metrics.
>> > >>>>>>>>>>>
>> > >>>>>>>>>>> If you have any suggestions or feedback, let me know, it
>> will be
>> > >>>>>>> much
>> > >>>>>>>>>>> appreciated.
>> > >>>
>> > >>>
>> > >>>
>> > >>
>> > >
>>
>

Reply via email to