This looks great! +1 (binding) Sophie
On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax <mj...@apache.org> wrote: > +1 (binding) > > On 9/13/23 5:48 PM, Jason Gustafson wrote: > > Hey Andrew, > > > > +1 on the KIP. For many users of Kafka, it may not be fully understood > how > > much of a challenge client monitoring is. With tens of clients in a > > cluster, it is already difficult to coordinate metrics collection. When > > there are thousands of clients, and when the cluster operator has no > > control over them, it is essentially impossible. For the fat clients that > > we have, the lack of useful telemetry is a huge operational gap. > > Consistency between clients has also been a major challenge. I think the > > effort toward standardization in this KIP will have some positive impact > > even in deployments which have effective client-side monitoring. > Overall, I > > think this proposal will provide a lot of value across the board. > > > > Best, > > Jason > > > > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee <philip...@gmail.com> wrote: > > > >> Hey Andrew - > >> > >> Thank you for taking the time to reply to my questions. I'm just adding > >> some notes to this discussion. > >> > >> 1. epoch: It can be helpful to know the delta of the client side and the > >> actual leader epoch. It is helpful to understand why sometimes commit > >> fails/client not making progress. > >> 2. Client connection: If the client selects the "wrong" connection to > push > >> out the data, I assume the request would timeout; which should lead to > >> disconnecting from the node and reselecting another node as you > mentioned, > >> via the least loaded node. > >> > >> Cheers, > >> P > >> > >> > >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < > >> andrew_schofield_j...@outlook.com> wrote: > >> > >>> Hi Philip, > >>> Thanks for your vote and interest in the KIP. > >>> > >>> KIP-714 does not introduce any new client metrics, and that’s > >> intentional. > >>> It does > >>> tell how that all of the client metrics can have their names > transformed > >>> into > >>> equivalent "telemetry metric names”, and then potentially used in > metrics > >>> subscriptions. > >>> > >>> I am interested in the idea of client’s leader epoch in this context, > but > >>> I don’t have > >>> an immediate plan for how best to do this, and it would take another > KIP > >>> to enhance > >>> existing metrics or introduce some new ones. Those would then naturally > >> be > >>> applicable to the metrics push introduced in KIP-714. > >>> > >>> In a similar vein, there are no existing client metrics specifically > for > >>> auto-commit. > >>> We could add them to Kafka, but I really think this is just an example > of > >>> asynchronous > >>> commit in which the application has decided not to specify when the > >> commit > >>> should > >>> begin. > >>> > >>> It is possible to increase the cadence of pushing by modifying the > >>> interval.ms > >>> configuration property of the CLIENT_METRICS resource. > >>> > >>> There is an “assigned-partitions” metric for each consumer, but not one > >> for > >>> active partitions. We could add one, again as a follow-on KIP. > >>> > >>> I take your point about holding on to a connection in a channel which > >> might > >>> experience congestion. Do you have a suggestion for how to improve on > >> this? > >>> For example, the client does have the concept of a least-loaded node. > >> Maybe > >>> this is something we should investigate in the implementation and > decide > >>> on the > >>> best approach. In general, I think sticking with the same node for > >>> consecutive > >>> pushes is best, but if you choose the “wrong” node to start with, it’s > >> not > >>> ideal. > >>> > >>> Thanks, > >>> Andrew > >>> > >>>> On 8 Sep 2023, at 19:29, Philip Nee <philip...@gmail.com> wrote: > >>>> > >>>> Hey Andrew - > >>>> > >>>> +1 but I don't have a binding vote! > >>>> > >>>> It took me a while to go through the KIP. Here are some of my notes > >>> during > >>>> the reading: > >>>> > >>>> *Metrics* > >>>> - Should we care about the client's leader epoch? There is a case > where > >>> the > >>>> user recreates the topic, but the consumer thinks it is still the same > >>>> topic and therefore, attempts to start from an offset that doesn't > >> exist. > >>>> KIP-848 addresses this issue, but I can still see some potential > >> benefits > >>>> from knowing the client's epoch information. > >>>> - I assume poll idle is similar to poll interval: I needed to read the > >>>> description a few times. > >>>> - I don't have a clear use case in mind for the commit latency, but I > >> do > >>>> think sometimes people lack clarity about how much progress was > tracked > >>> by > >>>> the auto-commit. Would tracking auto-commit-related metrics be > >> useful? I > >>>> was thinking: the last offset committed or the actual cadence in ms. > >>>> - Are there cases when we need to increase the cadence of telemetry > >> data > >>>> push? i.e. variable interval. > >>>> - Thanks for implementing the randomized initial metric push; I think > >> it > >>> is > >>>> really important. > >>>> - Is there a potential use case for tracking the number of active > >>>> partitions? The consumer can pause partitions via API, during > >> revocation, > >>>> or during offset reset for the stream. > >>>> > >>>> *Connections*: > >>>> - The KIP stated that it will keep the same connection until the > >>> connection > >>>> is disconnected. I wonder if that could potentially cause congestion > if > >>> it > >>>> is already a busy channel, which leads to connection timeout and > >>>> subsequently disconnection. > >>>> > >>>> Thanks, > >>>> P > >>>> > >>>> On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < > >>>> andrew_schofield_j...@outlook.com> wrote: > >>>> > >>>>> Bumping the voting thread for KIP-714. > >>>>> > >>>>> So far, we have: > >>>>> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) > >>>>> > >>>>> Thanks, > >>>>> Andrew > >>>>> > >>>>>> On 4 Aug 2023, at 09:45, Andrew Schofield < > andrew_schofi...@live.com > >>> > >>>>> wrote: > >>>>>> > >>>>>> Hi, > >>>>>> After almost 2 1/2 years in the making, I would like to call a vote > >> for > >>>>> KIP-714 ( > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > >>>>> ). > >>>>>> > >>>>>> This KIP aims to improve monitoring and troubleshooting of client > >>>>> performance by enabling clients to push metrics to brokers. > >>>>>> > >>>>>> I’d like to thank everyone that participated in the discussion, > >>>>> especially the librdkafka team since one of the aims of the KIP is to > >>>>> enable any client to participate, not just the Apache Kafka project’s > >>> Java > >>>>> clients. > >>>>>> > >>>>>> Thanks, > >>>>>> Andrew > >>> > >>> > >>> > >> > > >