Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Andrew Schofield Mon, 15 Jul 2024 06:45:01 -0700

Hi Brenden,
Thanks for the updates.

AS4. I see that you’ve added `.ms` to a bunch of the metrics reflecting the
fact that they’re measured in milliseconds. However, I observe that most metrics
in Kafka that are measured in milliseconds, with some exceptions in Kafka 
Connect
and MirrorMaker do not follow this convention. I would tend to err on the side 
of
consistency with the existing metrics and not use `.ms`. However, that’s just my
opinion, so I’d be interested to know what other reviewers of the KIP think.


Thanks,
Andrew

> On 12 Jul 2024, at 20:11, Brenden Deluna <[email protected]> wrote:
>
> Hey Lianet,
>
> Thank you for your suggestions and feedback!
>
>
> LM1. This has now been addressed.
>
>
> LM2. I think that would be a valuable addition to the current set of
> metrics, I will get that added.
>
>
> LM3. Again great idea, that would certainly be helpful. Will add that as
> well.
>
>
> Let me know if you have any more suggestions!
>
>
> Thanks,
>
> Brenden
>
> On Fri, Jul 12, 2024 at 2:11 PM Brenden Deluna <[email protected]> wrote:
>
>> Hi Lucas,
>>
>> Thank you for the feedback! I have addressed your comments:
>>
>>
>> LB1. Good catch there, I will update the names as needed.
>>
>>
>> LB2. Good catch again! I will update the name to be more consistent.
>>
>>
>> LB3. Thank you for pointing this out, I realized that all metric values
>> will actually be set to 0. I will specifiy this and explain why they will
>> be 0.
>>
>>
>> Nit: This metric is referring to the queue of unsent requests in the
>> NetworkClientDelegate. For the metric descriptions I am trying to not
>> include too much of the implementation details, hence the reason that
>> description is quite short. I cannot think of other ways to describe the
>> metric without going deeper into the implementation, but please do let me
>> know if you have any ideas.
>>
>>
>> Thank you,
>>
>> Brenden
>>
>> On Fri, Jul 12, 2024 at 1:27 PM Lianet M. <[email protected]> wrote:
>>
>>> Hey Brenden, thanks for the KIP! Great to get more visibility into the new
>>> consumer.
>>>
>>> LM1. +1 on Lucas's suggestion for including the unit in the name, seems
>>> clearer and consistent (I do see several time metrics including ms)
>>>
>>> LM2. What about a new metric for application-event-queue-time-ms. It would
>>> be a complement to the application-event-queue-size you're proposing, and
>>> it will tell us how long the events sit in the queue, waiting to be
>>> processed (from the time the API call adds the event to the queue, to the
>>> time it's processed in the background thread). I find it would be very
>>> interesting.
>>>
>>> LM3. Thinking about the actual usage of
>>> "time-between-network-thread-poll-xxx" metric, I imagine it would be
>>> helpful to know more about what could be impacting it. As I see it, the
>>> network thread cadence could be mainly impacted by: 1- app event
>>> processing
>>> (generate requests), 2- network client poll (actual send/receive). For 2,
>>> the new consumer reuses the same component as the legacy one, but 1 is
>>> specific to the new consumer, so what about a metric
>>> for application-event-processing-time-ms (we could consider avg I would
>>> say). It would be the time that the network thread takes to process all
>>> available events on each run.
>>>
>>> Cheers!
>>> Lianet
>>>
>>> On Fri, Jul 12, 2024 at 1:57 PM Lucas Brutschy
>>> <[email protected]> wrote:
>>>
>>>> Hey Brenden,
>>>>
>>>> thanks for the KIP! These will be great to observe and debug the
>>>> background thread of the new consumer.
>>>>
>>>> LB1. `time-between-network-thread-poll-max` → I see several similar
>>>> metrics including the unit in the metric name (ms or us). We could
>>>> consider this, although it's probably not strictly required. However,
>>>> at least the description should state the unit. Same for the `avg`
>>>> version.
>>>> LB2. `unsent-requests-size` → Naming sounds a bit like it's referring
>>>> to the size of the request. How about `unsent-request-queue-size` or
>>>> `unsent-request-count` or simply `unsent-requests`?
>>>> LB3. "the proposed metrics below will be set to null or 0." → which
>>>> one will be set to null and which ones will be set to 0, and why?
>>>>
>>>> nit: "The current number of unsent requests in the consumer network" →
>>>> Seems to be missing something?
>>>>
>>>> Cheers,
>>>> Lucas
>>>>
>>>> On Fri, Jul 12, 2024 at 7:28 PM Brenden Deluna
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi Andrew,
>>>>> Thank you for the feedback and your question.
>>>>>
>>>>> AS1. Great idea, I will get that added.
>>>>>
>>>>> AS2. For unsent-events-age-max, age will be calculated once the event
>>> is
>>>>> sent, so you are correct.
>>>>>
>>>>> AS3. I agree, I think that would be a helpful metric to add, thank
>>> you! I
>>>>> will get that added.
>>>>>
>>>>> Please let me know if you have any additional feedback, suggestions,
>>> or
>>>>> questions.
>>>>>
>>>>> Thanks,
>>>>> Brenden
>>>>>
>>>>> On Fri, Jul 12, 2024 at 11:45 AM Andrew Schofield <
>>>> [email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hi Brenden,
>>>>>> Thanks for the KIP. It fills a gap in the metrics for the new
>>> consumer
>>>>>> nicely.
>>>>>>
>>>>>> AS1. If using the CLASSIC protocol, and thus the
>>> LegacyKafkaConsumer,
>>>>>> I would expect that the metrics do not exist at all. Maybe say
>>>> something
>>>>>> like
>>>>>> “These metrics are for the new consumer implementation using the
>>>>>> CONSUMER protocol”.
>>>>>>
>>>>>> AS2. For unsent-events-age-max, when is the age calculated? For
>>>> example,
>>>>>> is it calculated at the time that the unsent event is removed from
>>> the
>>>>>> list and sent, or does the metric reflect unsent events which are
>>> still
>>>>>> enqueued? I suspect the former, but thought I’d check.
>>>>>>
>>>>>> AS3. I think that unsent-events-age-avg would also be interesting to
>>>>>> get an idea of how long unsent events tend to sit around before
>>>> sending.
>>>>>> Of course, the question from AS2 would also apply to the average.
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
>>>>>>
>>>>>>> On 10 Jul 2024, at 17:44, Philip Nee <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> This is the link to the KIP document.
>>>>>>>
>>>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1068%3A+New+JMX+metrics+for+the+new+KafkaConsumer
>>>>>>>
>>>>>>> Any comment is appreciated,
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jul 9, 2024 at 10:14 AM Brenden Deluna
>>>>>> <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello everyone,
>>>>>>>>
>>>>>>>> I would like to start the discussion thread for KIP-1068. This
>>> is a
>>>>>>>> relatively small KIP, only proposing to add a couple of new
>>> metrics.
>>>>>>>>
>>>>>>>> If you have any suggestions or feedback, let me know, it will be
>>>> much
>>>>>>>> appreciated.

Re: [DISCUSS] KIP-1068: KIP-1068: New JMX Metrics for AsyncKafkaConsumer

Reply via email to