Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Brenden Deluna Fri, 19 Jul 2024 12:14:42 -0700

Hi Apoorv,
Thank you for your comments, I will address each.

AM1. I can see the usefulness in also having an
'application-event-queue-age-max' to get an idea of outliers and how they
may be affecting the average metric. I will add that.


AM2. I agree with you there, I think 'time' is a better descriptor here
than 'age'. I will update those metric names as well.

AM3. Similar to above comments, I will change the name of that metric to be
more consistent. And I think a max metric would also be useful here, adding
that.

AM4. Yes, good catch there. Will update that as well.

Thank you,
Brenden

On Fri, Jul 19, 2024 at 8:14 AM Apoorv Mittal <apoorvmitta...@gmail.com>
wrote:

> Hi Brendan,
> Thanks for the KIP. The metrics are always helpful.
>
> AM1: Is `application-event-queue-age-avg` enough or do we require `
> application-event-queue-age-max` as well to differentiate with outliers?
>
> AM2: The kafka producer defines metric `record-queue-time-avg` which
> captures the time spent in the buffer. Do you think we should have a
> similar name for `application-event-queue-age-avg` i.e. change to `
> application-event-queue-time-avg`? Moreover other than similar naming,
> `time` anyways seems more suitable than `age`, though minor. The `time`
> usage is also aligned with the description of this metric.
>
> AM3: Metric `application-event-processing-time` says "the average time,
> that the consumer network.....". Shall we have the `-avg` suffix in the
> metric as we have defined for other metrics? Also do we require the max
> metric as well for the same?
>
> AM4: Is the telemetry name for `unsent-requests-queue-size` intended
> as `org.apache.kafka.consumer.unsent.requests.size`,
> or it should be corrected to `
> org.apache.kafka.consumer.unsent.requests.queue.size`?
>
> AM2:
> Regards,
> Apoorv Mittal
> +44 7721681581
>
>
> On Mon, Jul 15, 2024 at 2:45 PM Andrew Schofield <
> andrew_schofi...@live.com>
> wrote:
>
> > Hi Brenden,
> > Thanks for the updates.
> >
> > AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting
> the
> > fact that they’re measured in milliseconds. However, I observe that most
> > metrics
> > in Kafka that are measured in milliseconds, with some exceptions in Kafka
> > Connect
> > and MirrorMaker do not follow this convention. I would tend to err on the
> > side of
> > consistency with the existing metrics and not use `.ms`. However, that’s
> > just my
> > opinion, so I’d be interested to know what other reviewers of the KIP
> > think.
> >
> > Thanks,
> > Andrew
> >
> > > On 12 Jul 2024, at 20:11, Brenden Deluna <bdel...@confluent.io.INVALID
> >
> > wrote:
> > >
> > > Hey Lianet,
> > >
> > > Thank you for your suggestions and feedback!
> > >
> > >
> > > LM1. This has now been addressed.
> > >
> > >
> > > LM2. I think that would be a valuable addition to the current set of
> > > metrics, I will get that added.
> > >
> > >
> > > LM3. Again great idea, that would certainly be helpful. Will add that
> as
> > > well.
> > >
> > >
> > > Let me know if you have any more suggestions!
> > >
> > >
> > > Thanks,
> > >
> > > Brenden
> > >
> > > On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <bdel...@confluent.io>
> > wrote:
> > >
> > >> Hi Lucas,
> > >>
> > >> Thank you for the feedback! I have addressed your comments:
> > >>
> > >>
> > >> LB1. Good catch there, I will update the names as needed.
> > >>
> > >>
> > >> LB2. Good catch again! I will update the name to be more consistent.
> > >>
> > >>
> > >> LB3. Thank you for pointing this out, I realized that all metric
> values
> > >> will actually be set to 0. I will specifiy this and explain why they
> > will
> > >> be 0.
> > >>
> > >>
> > >> Nit: This metric is referring to the queue of unsent requests in the
> > >> NetworkClientDelegate. For the metric descriptions I am trying to not
> > >> include too much of the implementation details, hence the reason that
> > >> description is quite short. I cannot think of other ways to describe
> the
> > >> metric without going deeper into the implementation, but please do let
> > me
> > >> know if you have any ideas.
> > >>
> > >>
> > >> Thank you,
> > >>
> > >> Brenden
> > >>
> > >> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <liane...@gmail.com> wrote:
> > >>
> > >>> Hey Brenden, thanks for the KIP! Great to get more visibility into
> the
> > new
> > >>> consumer.
> > >>>
> > >>> LM1. +1 on Lucas's suggestion for including the unit in the name,
> seems
> > >>> clearer and consistent (I do see several time metrics including ms)
> > >>>
> > >>> LM2. What about a new metric for application-event-queue-time-ms. It
> > would
> > >>> be a complement to the application-event-queue-size you're proposing,
> > and
> > >>> it will tell us how long the events sit in the queue, waiting to be
> > >>> processed (from the time the API call adds the event to the queue, to
> > the
> > >>> time it's processed in the background thread). I find it would be
> very
> > >>> interesting.
> > >>>
> > >>> LM3. Thinking about the actual usage of
> > >>> "time-between-network-thread-poll-xxx" metric, I imagine it would be
> > >>> helpful to know more about what could be impacting it. As I see it,
> the
> > >>> network thread cadence could be mainly impacted by: 1- app event
> > >>> processing
> > >>> (generate requests), 2- network client poll (actual send/receive).
> For
> > 2,
> > >>> the new consumer reuses the same component as the legacy one, but 1
> is
> > >>> specific to the new consumer, so what about a metric
> > >>> for application-event-processing-time-ms (we could consider avg I
> would
> > >>> say). It would be the time that the network thread takes to process
> all
> > >>> available events on each run.
> > >>>
> > >>> Cheers!
> > >>> Lianet
> > >>>
> > >>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
> > >>> <lbruts...@confluent.io.invalid> wrote:
> > >>>
> > >>>> Hey Brenden,
> > >>>>
> > >>>> thanks for the KIP! These will be great to observe and debug the
> > >>>> background thread of the new consumer.
> > >>>>
> > >>>> LB1. `time-between-network-thread-poll-max` → I see several similar
> > >>>> metrics including the unit in the metric name (ms or us). We could
> > >>>> consider this, although it's probably not strictly required.
> However,
> > >>>> at least the description should state the unit. Same for the `avg`
> > >>>> version.
> > >>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's
> referring
> > >>>> to the size of the request. How about `unsent-request-queue-size` or
> > >>>> `unsent-request-count` or simply `unsent-requests`?
> > >>>> LB3. "the proposed metrics below will be set to null or 0." → which
> > >>>> one will be set to null and which ones will be set to 0, and why?
> > >>>>
> > >>>> nit: "The current number of unsent requests in the consumer
> network" →
> > >>>> Seems to be missing something?
> > >>>>
> > >>>> Cheers,
> > >>>> Lucas
> > >>>>
> > >>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
> > >>>> <bdel...@confluent.io.invalid> wrote:
> > >>>>>
> > >>>>> Hi Andrew,
> > >>>>> Thank you for the feedback and your question.
> > >>>>>
> > >>>>> AS1. Great idea, I will get that added.
> > >>>>>
> > >>>>> AS2. For unsent-events-age-max, age will be calculated once the
> event
> > >>> is
> > >>>>> sent, so you are correct.
> > >>>>>
> > >>>>> AS3. I agree, I think that would be a helpful metric to add, thank
> > >>> you! I
> > >>>>> will get that added.
> > >>>>>
> > >>>>> Please let me know if you have any additional feedback,
> suggestions,
> > >>> or
> > >>>>> questions.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Brenden
> > >>>>>
> > >>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
> > >>>> andrew_schofi...@live.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Brenden,
> > >>>>>> Thanks for the KIP. It fills a gap in the metrics for the new
> > >>> consumer
> > >>>>>> nicely.
> > >>>>>>
> > >>>>>> AS1. If using the CLASSIC protocol, and thus the
> > >>> LegacyKafkaConsumer,
> > >>>>>> I would expect that the metrics do not exist at all. Maybe say
> > >>>> something
> > >>>>>> like
> > >>>>>> “These metrics are for the new consumer implementation using the
> > >>>>>> CONSUMER protocol”.
> > >>>>>>
> > >>>>>> AS2. For unsent-events-age-max, when is the age calculated? For
> > >>>> example,
> > >>>>>> is it calculated at the time that the unsent event is removed from
> > >>> the
> > >>>>>> list and sent, or does the metric reflect unsent events which are
> > >>> still
> > >>>>>> enqueued? I suspect the former, but thought I’d check.
> > >>>>>>
> > >>>>>> AS3. I think that unsent-events-age-avg would also be interesting
> to
> > >>>>>> get an idea of how long unsent events tend to sit around before
> > >>>> sending.
> > >>>>>> Of course, the question from AS2 would also apply to the average.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Andrew
> > >>>>>>
> > >>>>>>> On 10 Jul 2024, at 17:44, Philip Nee <p...@confluent.io.INVALID>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> This is the link to the KIP document.
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer
> > >>>>>>>
> > >>>>>>> Any comment is appreciated,
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna
> > >>>>>> <bdel...@confluent.io.invalid>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hello everyone,
> > >>>>>>>>
> > >>>>>>>> I would like to start the discussion thread for KIP-1068. This
> > >>> is a
> > >>>>>>>> relatively small KIP, only proposing to add a couple of new
> > >>> metrics.
> > >>>>>>>>
> > >>>>>>>> If you have any suggestions or feedback, let me know, it will be
> > >>>> much
> > >>>>>>>> appreciated.
> >
> >
> >
>

Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Reply via email to