Re: [VOTE] KIP-714: Client metrics and observability
I’d like to summarise the minor changes we made to KIP-714 as we completed the code in Kafka 3.7.0. * Introduced “*” to differentiate “all metrics subscribed” from “no metrics subscribed”. * Corrected the ACL operation for AlterConfigs to ALTER_CONFIGS on CLUSTER. * Removed the “block” option from `kafka-client-metrics.sh` because the idea needs additional work to make it workable. * Corrected of CRC32 to CRC32C. * Added missing exceptions in the admin client interfaces. * Some uses of “client metrics subscriptions” were incorrect and have been replaced with “client metrics configuration resources”. The subscriptions are derived from the configuration resources, but they are not the same thing. Thanks, Andrew > On 16 Oct 2023, at 09:18, Andrew Schofield > wrote: > > The vote for KIP-714 has now concluded and the KIP is APPROVED. > > The votes are: > Binding: > +4 (Jason, Matthias, Sophie, Jun) > Non-binding: > +3 (Milind, Kirk, Philip) > -1 (Ryanne) > > This KIP aims to improve monitoring and troubleshooting of client > performance by enabling clients to push metrics to brokers. The lack of > consistent telemetry across clients is an operational gap, and many cluster > operators do not have control over the clients. Often, asking the client owner > to change the configuration or even application code in order to troubleshoot > problems is not workable. This is why the KIP enables the broker to request > metrics from clients, giving a consistent, cross-platform mechanism. > > The feature is enabled by configuring a metrics plugin on the brokers which > implements the ClientTelemetry interface. In the absence of a plugin with this > interface, the brokers do not even support the new RPCs in this KIP and the > clients will not attempt or be able to push metrics. So, a vanilla Apache > Kafka > broker will not collect metrics. > > I would like to make available an open-source implementation of the > ClientTelemetry > interface that works with an open-source monitoring solution. > > The KIP does put support for OTLP serialisation into the client, so there are > new dependencies in the Java client, which are bundled and relocated (shaded). > OTLP also opens up other use cases involving OpenTelemetry in the future, > which > is emerging as the de facto standard for telemetry, and observability in > general. > > Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill > kicked it all off in February 2021. > > Andrew > >> On 14 Oct 2023, at 01:52, Jun Rao wrote: >> >> Hi, Andrew, >> >> Thanks for the KIP. +1 from me too. >> >> Jun >> >> On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman >> wrote: >> >>> This looks great! +1 (binding) >>> >>> Sophie >>> >>> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax wrote: >>> +1 (binding) On 9/13/23 5:48 PM, Jason Gustafson wrote: > Hey Andrew, > > +1 on the KIP. For many users of Kafka, it may not be fully understood how > much of a challenge client monitoring is. With tens of clients in a > cluster, it is already difficult to coordinate metrics collection. When > there are thousands of clients, and when the cluster operator has no > control over them, it is essentially impossible. For the fat clients >>> that > we have, the lack of useful telemetry is a huge operational gap. > Consistency between clients has also been a major challenge. I think >>> the > effort toward standardization in this KIP will have some positive >>> impact > even in deployments which have effective client-side monitoring. Overall, I > think this proposal will provide a lot of value across the board. > > Best, > Jason > > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee >>> wrote: > >> Hey Andrew - >> >> Thank you for taking the time to reply to my questions. I'm just >>> adding >> some notes to this discussion. >> >> 1. epoch: It can be helpful to know the delta of the client side and >>> the >> actual leader epoch. It is helpful to understand why sometimes commit >> fails/client not making progress. >> 2. Client connection: If the client selects the "wrong" connection to push >> out the data, I assume the request would timeout; which should lead to >> disconnecting from the node and reselecting another node as you mentioned, >> via the least loaded node. >> >> Cheers, >> P >> >> >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < >> andrew_schofield_j...@outlook.com> wrote: >> >>> Hi Philip, >>> Thanks for your vote and interest in the KIP. >>> >>> KIP-714 does not introduce any new client metrics, and that’s >> intentional. >>> It does >>> tell how that all of the client metrics can have their names transformed >>> into >>> equivalent "telemetry metric names”, and then potentially used in metrics >>> subscriptions. >>> >>
Re: [VOTE] KIP-714: Client metrics and observability
The vote for KIP-714 has now concluded and the KIP is APPROVED. The votes are: Binding: +4 (Jason, Matthias, Sophie, Jun) Non-binding: +3 (Milind, Kirk, Philip) -1 (Ryanne) This KIP aims to improve monitoring and troubleshooting of client performance by enabling clients to push metrics to brokers. The lack of consistent telemetry across clients is an operational gap, and many cluster operators do not have control over the clients. Often, asking the client owner to change the configuration or even application code in order to troubleshoot problems is not workable. This is why the KIP enables the broker to request metrics from clients, giving a consistent, cross-platform mechanism. The feature is enabled by configuring a metrics plugin on the brokers which implements the ClientTelemetry interface. In the absence of a plugin with this interface, the brokers do not even support the new RPCs in this KIP and the clients will not attempt or be able to push metrics. So, a vanilla Apache Kafka broker will not collect metrics. I would like to make available an open-source implementation of the ClientTelemetry interface that works with an open-source monitoring solution. The KIP does put support for OTLP serialisation into the client, so there are new dependencies in the Java client, which are bundled and relocated (shaded). OTLP also opens up other use cases involving OpenTelemetry in the future, which is emerging as the de facto standard for telemetry, and observability in general. Thanks to everyone who has contributed to KIP-714 since Magnus Edenhill kicked it all off in February 2021. Andrew > On 14 Oct 2023, at 01:52, Jun Rao wrote: > > Hi, Andrew, > > Thanks for the KIP. +1 from me too. > > Jun > > On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman > wrote: > >> This looks great! +1 (binding) >> >> Sophie >> >> On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax wrote: >> >>> +1 (binding) >>> >>> On 9/13/23 5:48 PM, Jason Gustafson wrote: Hey Andrew, +1 on the KIP. For many users of Kafka, it may not be fully understood >>> how much of a challenge client monitoring is. With tens of clients in a cluster, it is already difficult to coordinate metrics collection. When there are thousands of clients, and when the cluster operator has no control over them, it is essentially impossible. For the fat clients >> that we have, the lack of useful telemetry is a huge operational gap. Consistency between clients has also been a major challenge. I think >> the effort toward standardization in this KIP will have some positive >> impact even in deployments which have effective client-side monitoring. >>> Overall, I think this proposal will provide a lot of value across the board. Best, Jason On Wed, Sep 13, 2023 at 9:50 AM Philip Nee >> wrote: > Hey Andrew - > > Thank you for taking the time to reply to my questions. I'm just >> adding > some notes to this discussion. > > 1. epoch: It can be helpful to know the delta of the client side and >> the > actual leader epoch. It is helpful to understand why sometimes commit > fails/client not making progress. > 2. Client connection: If the client selects the "wrong" connection to >>> push > out the data, I assume the request would timeout; which should lead to > disconnecting from the node and reselecting another node as you >>> mentioned, > via the least loaded node. > > Cheers, > P > > > On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < > andrew_schofield_j...@outlook.com> wrote: > >> Hi Philip, >> Thanks for your vote and interest in the KIP. >> >> KIP-714 does not introduce any new client metrics, and that’s > intentional. >> It does >> tell how that all of the client metrics can have their names >>> transformed >> into >> equivalent "telemetry metric names”, and then potentially used in >>> metrics >> subscriptions. >> >> I am interested in the idea of client’s leader epoch in this context, >>> but >> I don’t have >> an immediate plan for how best to do this, and it would take another >>> KIP >> to enhance >> existing metrics or introduce some new ones. Those would then >> naturally > be >> applicable to the metrics push introduced in KIP-714. >> >> In a similar vein, there are no existing client metrics specifically >>> for >> auto-commit. >> We could add them to Kafka, but I really think this is just an >> example >>> of >> asynchronous >> commit in which the application has decided not to specify when the > commit >> should >> begin. >> >> It is possible to increase the cadence of pushing by modifying the >> interval.ms >> configuration property of the CLIENT_METRICS resource. >> >> There is an “assigned-partitions” metric for each consumer, but not >> one
Re: [VOTE] KIP-714: Client metrics and observability
Hi, Andrew, Thanks for the KIP. +1 from me too. Jun On Wed, Oct 11, 2023 at 4:00 PM Sophie Blee-Goldman wrote: > This looks great! +1 (binding) > > Sophie > > On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax wrote: > > > +1 (binding) > > > > On 9/13/23 5:48 PM, Jason Gustafson wrote: > > > Hey Andrew, > > > > > > +1 on the KIP. For many users of Kafka, it may not be fully understood > > how > > > much of a challenge client monitoring is. With tens of clients in a > > > cluster, it is already difficult to coordinate metrics collection. When > > > there are thousands of clients, and when the cluster operator has no > > > control over them, it is essentially impossible. For the fat clients > that > > > we have, the lack of useful telemetry is a huge operational gap. > > > Consistency between clients has also been a major challenge. I think > the > > > effort toward standardization in this KIP will have some positive > impact > > > even in deployments which have effective client-side monitoring. > > Overall, I > > > think this proposal will provide a lot of value across the board. > > > > > > Best, > > > Jason > > > > > > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee > wrote: > > > > > >> Hey Andrew - > > >> > > >> Thank you for taking the time to reply to my questions. I'm just > adding > > >> some notes to this discussion. > > >> > > >> 1. epoch: It can be helpful to know the delta of the client side and > the > > >> actual leader epoch. It is helpful to understand why sometimes commit > > >> fails/client not making progress. > > >> 2. Client connection: If the client selects the "wrong" connection to > > push > > >> out the data, I assume the request would timeout; which should lead to > > >> disconnecting from the node and reselecting another node as you > > mentioned, > > >> via the least loaded node. > > >> > > >> Cheers, > > >> P > > >> > > >> > > >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < > > >> andrew_schofield_j...@outlook.com> wrote: > > >> > > >>> Hi Philip, > > >>> Thanks for your vote and interest in the KIP. > > >>> > > >>> KIP-714 does not introduce any new client metrics, and that’s > > >> intentional. > > >>> It does > > >>> tell how that all of the client metrics can have their names > > transformed > > >>> into > > >>> equivalent "telemetry metric names”, and then potentially used in > > metrics > > >>> subscriptions. > > >>> > > >>> I am interested in the idea of client’s leader epoch in this context, > > but > > >>> I don’t have > > >>> an immediate plan for how best to do this, and it would take another > > KIP > > >>> to enhance > > >>> existing metrics or introduce some new ones. Those would then > naturally > > >> be > > >>> applicable to the metrics push introduced in KIP-714. > > >>> > > >>> In a similar vein, there are no existing client metrics specifically > > for > > >>> auto-commit. > > >>> We could add them to Kafka, but I really think this is just an > example > > of > > >>> asynchronous > > >>> commit in which the application has decided not to specify when the > > >> commit > > >>> should > > >>> begin. > > >>> > > >>> It is possible to increase the cadence of pushing by modifying the > > >>> interval.ms > > >>> configuration property of the CLIENT_METRICS resource. > > >>> > > >>> There is an “assigned-partitions” metric for each consumer, but not > one > > >> for > > >>> active partitions. We could add one, again as a follow-on KIP. > > >>> > > >>> I take your point about holding on to a connection in a channel which > > >> might > > >>> experience congestion. Do you have a suggestion for how to improve on > > >> this? > > >>> For example, the client does have the concept of a least-loaded node. > > >> Maybe > > >>> this is something we should investigate in the implementation and > > decide > > >>> on the > > >>> best approach. In general, I think sticking with the same node for > > >>> consecutive > > >>> pushes is best, but if you choose the “wrong” node to start with, > it’s > > >> not > > >>> ideal. > > >>> > > >>> Thanks, > > >>> Andrew > > >>> > > On 8 Sep 2023, at 19:29, Philip Nee wrote: > > > > Hey Andrew - > > > > +1 but I don't have a binding vote! > > > > It took me a while to go through the KIP. Here are some of my notes > > >>> during > > the reading: > > > > *Metrics* > > - Should we care about the client's leader epoch? There is a case > > where > > >>> the > > user recreates the topic, but the consumer thinks it is still the > same > > topic and therefore, attempts to start from an offset that doesn't > > >> exist. > > KIP-848 addresses this issue, but I can still see some potential > > >> benefits > > from knowing the client's epoch information. > > - I assume poll idle is similar to poll interval: I needed to read > the > > description a few times. > > - I don't have a clear use case in mind for the commit latency, but > I > > >> do > > think
Re: [VOTE] KIP-714: Client metrics and observability
This looks great! +1 (binding) Sophie On Wed, Oct 11, 2023 at 1:46 PM Matthias J. Sax wrote: > +1 (binding) > > On 9/13/23 5:48 PM, Jason Gustafson wrote: > > Hey Andrew, > > > > +1 on the KIP. For many users of Kafka, it may not be fully understood > how > > much of a challenge client monitoring is. With tens of clients in a > > cluster, it is already difficult to coordinate metrics collection. When > > there are thousands of clients, and when the cluster operator has no > > control over them, it is essentially impossible. For the fat clients that > > we have, the lack of useful telemetry is a huge operational gap. > > Consistency between clients has also been a major challenge. I think the > > effort toward standardization in this KIP will have some positive impact > > even in deployments which have effective client-side monitoring. > Overall, I > > think this proposal will provide a lot of value across the board. > > > > Best, > > Jason > > > > On Wed, Sep 13, 2023 at 9:50 AM Philip Nee wrote: > > > >> Hey Andrew - > >> > >> Thank you for taking the time to reply to my questions. I'm just adding > >> some notes to this discussion. > >> > >> 1. epoch: It can be helpful to know the delta of the client side and the > >> actual leader epoch. It is helpful to understand why sometimes commit > >> fails/client not making progress. > >> 2. Client connection: If the client selects the "wrong" connection to > push > >> out the data, I assume the request would timeout; which should lead to > >> disconnecting from the node and reselecting another node as you > mentioned, > >> via the least loaded node. > >> > >> Cheers, > >> P > >> > >> > >> On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < > >> andrew_schofield_j...@outlook.com> wrote: > >> > >>> Hi Philip, > >>> Thanks for your vote and interest in the KIP. > >>> > >>> KIP-714 does not introduce any new client metrics, and that’s > >> intentional. > >>> It does > >>> tell how that all of the client metrics can have their names > transformed > >>> into > >>> equivalent "telemetry metric names”, and then potentially used in > metrics > >>> subscriptions. > >>> > >>> I am interested in the idea of client’s leader epoch in this context, > but > >>> I don’t have > >>> an immediate plan for how best to do this, and it would take another > KIP > >>> to enhance > >>> existing metrics or introduce some new ones. Those would then naturally > >> be > >>> applicable to the metrics push introduced in KIP-714. > >>> > >>> In a similar vein, there are no existing client metrics specifically > for > >>> auto-commit. > >>> We could add them to Kafka, but I really think this is just an example > of > >>> asynchronous > >>> commit in which the application has decided not to specify when the > >> commit > >>> should > >>> begin. > >>> > >>> It is possible to increase the cadence of pushing by modifying the > >>> interval.ms > >>> configuration property of the CLIENT_METRICS resource. > >>> > >>> There is an “assigned-partitions” metric for each consumer, but not one > >> for > >>> active partitions. We could add one, again as a follow-on KIP. > >>> > >>> I take your point about holding on to a connection in a channel which > >> might > >>> experience congestion. Do you have a suggestion for how to improve on > >> this? > >>> For example, the client does have the concept of a least-loaded node. > >> Maybe > >>> this is something we should investigate in the implementation and > decide > >>> on the > >>> best approach. In general, I think sticking with the same node for > >>> consecutive > >>> pushes is best, but if you choose the “wrong” node to start with, it’s > >> not > >>> ideal. > >>> > >>> Thanks, > >>> Andrew > >>> > On 8 Sep 2023, at 19:29, Philip Nee wrote: > > Hey Andrew - > > +1 but I don't have a binding vote! > > It took me a while to go through the KIP. Here are some of my notes > >>> during > the reading: > > *Metrics* > - Should we care about the client's leader epoch? There is a case > where > >>> the > user recreates the topic, but the consumer thinks it is still the same > topic and therefore, attempts to start from an offset that doesn't > >> exist. > KIP-848 addresses this issue, but I can still see some potential > >> benefits > from knowing the client's epoch information. > - I assume poll idle is similar to poll interval: I needed to read the > description a few times. > - I don't have a clear use case in mind for the commit latency, but I > >> do > think sometimes people lack clarity about how much progress was > tracked > >>> by > the auto-commit. Would tracking auto-commit-related metrics be > >> useful? I > was thinking: the last offset committed or the actual cadence in ms. > - Are there cases when we need to increase the cadence of telemetry > >> data > push? i.e. variable interval. > - Thanks for implementing the randomized i
Re: [VOTE] KIP-714: Client metrics and observability
+1 (binding) On 9/13/23 5:48 PM, Jason Gustafson wrote: Hey Andrew, +1 on the KIP. For many users of Kafka, it may not be fully understood how much of a challenge client monitoring is. With tens of clients in a cluster, it is already difficult to coordinate metrics collection. When there are thousands of clients, and when the cluster operator has no control over them, it is essentially impossible. For the fat clients that we have, the lack of useful telemetry is a huge operational gap. Consistency between clients has also been a major challenge. I think the effort toward standardization in this KIP will have some positive impact even in deployments which have effective client-side monitoring. Overall, I think this proposal will provide a lot of value across the board. Best, Jason On Wed, Sep 13, 2023 at 9:50 AM Philip Nee wrote: Hey Andrew - Thank you for taking the time to reply to my questions. I'm just adding some notes to this discussion. 1. epoch: It can be helpful to know the delta of the client side and the actual leader epoch. It is helpful to understand why sometimes commit fails/client not making progress. 2. Client connection: If the client selects the "wrong" connection to push out the data, I assume the request would timeout; which should lead to disconnecting from the node and reselecting another node as you mentioned, via the least loaded node. Cheers, P On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: Hi Philip, Thanks for your vote and interest in the KIP. KIP-714 does not introduce any new client metrics, and that’s intentional. It does tell how that all of the client metrics can have their names transformed into equivalent "telemetry metric names”, and then potentially used in metrics subscriptions. I am interested in the idea of client’s leader epoch in this context, but I don’t have an immediate plan for how best to do this, and it would take another KIP to enhance existing metrics or introduce some new ones. Those would then naturally be applicable to the metrics push introduced in KIP-714. In a similar vein, there are no existing client metrics specifically for auto-commit. We could add them to Kafka, but I really think this is just an example of asynchronous commit in which the application has decided not to specify when the commit should begin. It is possible to increase the cadence of pushing by modifying the interval.ms configuration property of the CLIENT_METRICS resource. There is an “assigned-partitions” metric for each consumer, but not one for active partitions. We could add one, again as a follow-on KIP. I take your point about holding on to a connection in a channel which might experience congestion. Do you have a suggestion for how to improve on this? For example, the client does have the concept of a least-loaded node. Maybe this is something we should investigate in the implementation and decide on the best approach. In general, I think sticking with the same node for consecutive pushes is best, but if you choose the “wrong” node to start with, it’s not ideal. Thanks, Andrew On 8 Sep 2023, at 19:29, Philip Nee wrote: Hey Andrew - +1 but I don't have a binding vote! It took me a while to go through the KIP. Here are some of my notes during the reading: *Metrics* - Should we care about the client's leader epoch? There is a case where the user recreates the topic, but the consumer thinks it is still the same topic and therefore, attempts to start from an offset that doesn't exist. KIP-848 addresses this issue, but I can still see some potential benefits from knowing the client's epoch information. - I assume poll idle is similar to poll interval: I needed to read the description a few times. - I don't have a clear use case in mind for the commit latency, but I do think sometimes people lack clarity about how much progress was tracked by the auto-commit. Would tracking auto-commit-related metrics be useful? I was thinking: the last offset committed or the actual cadence in ms. - Are there cases when we need to increase the cadence of telemetry data push? i.e. variable interval. - Thanks for implementing the randomized initial metric push; I think it is really important. - Is there a potential use case for tracking the number of active partitions? The consumer can pause partitions via API, during revocation, or during offset reset for the stream. *Connections*: - The KIP stated that it will keep the same connection until the connection is disconnected. I wonder if that could potentially cause congestion if it is already a busy channel, which leads to connection timeout and subsequently disconnection. Thanks, P On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: Bumping the voting thread for KIP-714. So far, we have: Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) Thanks, Andrew On 4 Aug 2023, at 09:45
Re: [VOTE] KIP-714: Client metrics and observability
Hey Andrew, +1 on the KIP. For many users of Kafka, it may not be fully understood how much of a challenge client monitoring is. With tens of clients in a cluster, it is already difficult to coordinate metrics collection. When there are thousands of clients, and when the cluster operator has no control over them, it is essentially impossible. For the fat clients that we have, the lack of useful telemetry is a huge operational gap. Consistency between clients has also been a major challenge. I think the effort toward standardization in this KIP will have some positive impact even in deployments which have effective client-side monitoring. Overall, I think this proposal will provide a lot of value across the board. Best, Jason On Wed, Sep 13, 2023 at 9:50 AM Philip Nee wrote: > Hey Andrew - > > Thank you for taking the time to reply to my questions. I'm just adding > some notes to this discussion. > > 1. epoch: It can be helpful to know the delta of the client side and the > actual leader epoch. It is helpful to understand why sometimes commit > fails/client not making progress. > 2. Client connection: If the client selects the "wrong" connection to push > out the data, I assume the request would timeout; which should lead to > disconnecting from the node and reselecting another node as you mentioned, > via the least loaded node. > > Cheers, > P > > > On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < > andrew_schofield_j...@outlook.com> wrote: > > > Hi Philip, > > Thanks for your vote and interest in the KIP. > > > > KIP-714 does not introduce any new client metrics, and that’s > intentional. > > It does > > tell how that all of the client metrics can have their names transformed > > into > > equivalent "telemetry metric names”, and then potentially used in metrics > > subscriptions. > > > > I am interested in the idea of client’s leader epoch in this context, but > > I don’t have > > an immediate plan for how best to do this, and it would take another KIP > > to enhance > > existing metrics or introduce some new ones. Those would then naturally > be > > applicable to the metrics push introduced in KIP-714. > > > > In a similar vein, there are no existing client metrics specifically for > > auto-commit. > > We could add them to Kafka, but I really think this is just an example of > > asynchronous > > commit in which the application has decided not to specify when the > commit > > should > > begin. > > > > It is possible to increase the cadence of pushing by modifying the > > interval.ms > > configuration property of the CLIENT_METRICS resource. > > > > There is an “assigned-partitions” metric for each consumer, but not one > for > > active partitions. We could add one, again as a follow-on KIP. > > > > I take your point about holding on to a connection in a channel which > might > > experience congestion. Do you have a suggestion for how to improve on > this? > > For example, the client does have the concept of a least-loaded node. > Maybe > > this is something we should investigate in the implementation and decide > > on the > > best approach. In general, I think sticking with the same node for > > consecutive > > pushes is best, but if you choose the “wrong” node to start with, it’s > not > > ideal. > > > > Thanks, > > Andrew > > > > > On 8 Sep 2023, at 19:29, Philip Nee wrote: > > > > > > Hey Andrew - > > > > > > +1 but I don't have a binding vote! > > > > > > It took me a while to go through the KIP. Here are some of my notes > > during > > > the reading: > > > > > > *Metrics* > > > - Should we care about the client's leader epoch? There is a case where > > the > > > user recreates the topic, but the consumer thinks it is still the same > > > topic and therefore, attempts to start from an offset that doesn't > exist. > > > KIP-848 addresses this issue, but I can still see some potential > benefits > > > from knowing the client's epoch information. > > > - I assume poll idle is similar to poll interval: I needed to read the > > > description a few times. > > > - I don't have a clear use case in mind for the commit latency, but I > do > > > think sometimes people lack clarity about how much progress was tracked > > by > > > the auto-commit. Would tracking auto-commit-related metrics be > useful? I > > > was thinking: the last offset committed or the actual cadence in ms. > > > - Are there cases when we need to increase the cadence of telemetry > data > > > push? i.e. variable interval. > > > - Thanks for implementing the randomized initial metric push; I think > it > > is > > > really important. > > > - Is there a potential use case for tracking the number of active > > > partitions? The consumer can pause partitions via API, during > revocation, > > > or during offset reset for the stream. > > > > > > *Connections*: > > > - The KIP stated that it will keep the same connection until the > > connection > > > is disconnected. I wonder if that could potentially cause congestion if > > it > > > is al
Re: [VOTE] KIP-714: Client metrics and observability
Hey Andrew - Thank you for taking the time to reply to my questions. I'm just adding some notes to this discussion. 1. epoch: It can be helpful to know the delta of the client side and the actual leader epoch. It is helpful to understand why sometimes commit fails/client not making progress. 2. Client connection: If the client selects the "wrong" connection to push out the data, I assume the request would timeout; which should lead to disconnecting from the node and reselecting another node as you mentioned, via the least loaded node. Cheers, P On Tue, Sep 12, 2023 at 10:40 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Hi Philip, > Thanks for your vote and interest in the KIP. > > KIP-714 does not introduce any new client metrics, and that’s intentional. > It does > tell how that all of the client metrics can have their names transformed > into > equivalent "telemetry metric names”, and then potentially used in metrics > subscriptions. > > I am interested in the idea of client’s leader epoch in this context, but > I don’t have > an immediate plan for how best to do this, and it would take another KIP > to enhance > existing metrics or introduce some new ones. Those would then naturally be > applicable to the metrics push introduced in KIP-714. > > In a similar vein, there are no existing client metrics specifically for > auto-commit. > We could add them to Kafka, but I really think this is just an example of > asynchronous > commit in which the application has decided not to specify when the commit > should > begin. > > It is possible to increase the cadence of pushing by modifying the > interval.ms > configuration property of the CLIENT_METRICS resource. > > There is an “assigned-partitions” metric for each consumer, but not one for > active partitions. We could add one, again as a follow-on KIP. > > I take your point about holding on to a connection in a channel which might > experience congestion. Do you have a suggestion for how to improve on this? > For example, the client does have the concept of a least-loaded node. Maybe > this is something we should investigate in the implementation and decide > on the > best approach. In general, I think sticking with the same node for > consecutive > pushes is best, but if you choose the “wrong” node to start with, it’s not > ideal. > > Thanks, > Andrew > > > On 8 Sep 2023, at 19:29, Philip Nee wrote: > > > > Hey Andrew - > > > > +1 but I don't have a binding vote! > > > > It took me a while to go through the KIP. Here are some of my notes > during > > the reading: > > > > *Metrics* > > - Should we care about the client's leader epoch? There is a case where > the > > user recreates the topic, but the consumer thinks it is still the same > > topic and therefore, attempts to start from an offset that doesn't exist. > > KIP-848 addresses this issue, but I can still see some potential benefits > > from knowing the client's epoch information. > > - I assume poll idle is similar to poll interval: I needed to read the > > description a few times. > > - I don't have a clear use case in mind for the commit latency, but I do > > think sometimes people lack clarity about how much progress was tracked > by > > the auto-commit. Would tracking auto-commit-related metrics be useful? I > > was thinking: the last offset committed or the actual cadence in ms. > > - Are there cases when we need to increase the cadence of telemetry data > > push? i.e. variable interval. > > - Thanks for implementing the randomized initial metric push; I think it > is > > really important. > > - Is there a potential use case for tracking the number of active > > partitions? The consumer can pause partitions via API, during revocation, > > or during offset reset for the stream. > > > > *Connections*: > > - The KIP stated that it will keep the same connection until the > connection > > is disconnected. I wonder if that could potentially cause congestion if > it > > is already a busy channel, which leads to connection timeout and > > subsequently disconnection. > > > > Thanks, > > P > > > > On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < > > andrew_schofield_j...@outlook.com> wrote: > > > >> Bumping the voting thread for KIP-714. > >> > >> So far, we have: > >> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) > >> > >> Thanks, > >> Andrew > >> > >>> On 4 Aug 2023, at 09:45, Andrew Schofield > >> wrote: > >>> > >>> Hi, > >>> After almost 2 1/2 years in the making, I would like to call a vote for > >> KIP-714 ( > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > >> ). > >>> > >>> This KIP aims to improve monitoring and troubleshooting of client > >> performance by enabling clients to push metrics to brokers. > >>> > >>> I’d like to thank everyone that participated in the discussion, > >> especially the librdkafka team since one of the aims of the KIP is to > >> enable any client to participate, not just the Apache
Re: [VOTE] KIP-714: Client metrics and observability
Hi Philip, Thanks for your vote and interest in the KIP. KIP-714 does not introduce any new client metrics, and that’s intentional. It does tell how that all of the client metrics can have their names transformed into equivalent "telemetry metric names”, and then potentially used in metrics subscriptions. I am interested in the idea of client’s leader epoch in this context, but I don’t have an immediate plan for how best to do this, and it would take another KIP to enhance existing metrics or introduce some new ones. Those would then naturally be applicable to the metrics push introduced in KIP-714. In a similar vein, there are no existing client metrics specifically for auto-commit. We could add them to Kafka, but I really think this is just an example of asynchronous commit in which the application has decided not to specify when the commit should begin. It is possible to increase the cadence of pushing by modifying the interval.ms configuration property of the CLIENT_METRICS resource. There is an “assigned-partitions” metric for each consumer, but not one for active partitions. We could add one, again as a follow-on KIP. I take your point about holding on to a connection in a channel which might experience congestion. Do you have a suggestion for how to improve on this? For example, the client does have the concept of a least-loaded node. Maybe this is something we should investigate in the implementation and decide on the best approach. In general, I think sticking with the same node for consecutive pushes is best, but if you choose the “wrong” node to start with, it’s not ideal. Thanks, Andrew > On 8 Sep 2023, at 19:29, Philip Nee wrote: > > Hey Andrew - > > +1 but I don't have a binding vote! > > It took me a while to go through the KIP. Here are some of my notes during > the reading: > > *Metrics* > - Should we care about the client's leader epoch? There is a case where the > user recreates the topic, but the consumer thinks it is still the same > topic and therefore, attempts to start from an offset that doesn't exist. > KIP-848 addresses this issue, but I can still see some potential benefits > from knowing the client's epoch information. > - I assume poll idle is similar to poll interval: I needed to read the > description a few times. > - I don't have a clear use case in mind for the commit latency, but I do > think sometimes people lack clarity about how much progress was tracked by > the auto-commit. Would tracking auto-commit-related metrics be useful? I > was thinking: the last offset committed or the actual cadence in ms. > - Are there cases when we need to increase the cadence of telemetry data > push? i.e. variable interval. > - Thanks for implementing the randomized initial metric push; I think it is > really important. > - Is there a potential use case for tracking the number of active > partitions? The consumer can pause partitions via API, during revocation, > or during offset reset for the stream. > > *Connections*: > - The KIP stated that it will keep the same connection until the connection > is disconnected. I wonder if that could potentially cause congestion if it > is already a busy channel, which leads to connection timeout and > subsequently disconnection. > > Thanks, > P > > On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < > andrew_schofield_j...@outlook.com> wrote: > >> Bumping the voting thread for KIP-714. >> >> So far, we have: >> Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) >> >> Thanks, >> Andrew >> >>> On 4 Aug 2023, at 09:45, Andrew Schofield >> wrote: >>> >>> Hi, >>> After almost 2 1/2 years in the making, I would like to call a vote for >> KIP-714 ( >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability >> ). >>> >>> This KIP aims to improve monitoring and troubleshooting of client >> performance by enabling clients to push metrics to brokers. >>> >>> I’d like to thank everyone that participated in the discussion, >> especially the librdkafka team since one of the aims of the KIP is to >> enable any client to participate, not just the Apache Kafka project’s Java >> clients. >>> >>> Thanks, >>> Andrew
Re: [VOTE] KIP-714: Client metrics and observability
Hey Andrew - +1 but I don't have a binding vote! It took me a while to go through the KIP. Here are some of my notes during the reading: *Metrics* - Should we care about the client's leader epoch? There is a case where the user recreates the topic, but the consumer thinks it is still the same topic and therefore, attempts to start from an offset that doesn't exist. KIP-848 addresses this issue, but I can still see some potential benefits from knowing the client's epoch information. - I assume poll idle is similar to poll interval: I needed to read the description a few times. - I don't have a clear use case in mind for the commit latency, but I do think sometimes people lack clarity about how much progress was tracked by the auto-commit. Would tracking auto-commit-related metrics be useful? I was thinking: the last offset committed or the actual cadence in ms. - Are there cases when we need to increase the cadence of telemetry data push? i.e. variable interval. - Thanks for implementing the randomized initial metric push; I think it is really important. - Is there a potential use case for tracking the number of active partitions? The consumer can pause partitions via API, during revocation, or during offset reset for the stream. *Connections*: - The KIP stated that it will keep the same connection until the connection is disconnected. I wonder if that could potentially cause congestion if it is already a busy channel, which leads to connection timeout and subsequently disconnection. Thanks, P On Fri, Sep 8, 2023 at 4:15 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Bumping the voting thread for KIP-714. > > So far, we have: > Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) > > Thanks, > Andrew > > > On 4 Aug 2023, at 09:45, Andrew Schofield > wrote: > > > > Hi, > > After almost 2 1/2 years in the making, I would like to call a vote for > KIP-714 ( > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > ). > > > > This KIP aims to improve monitoring and troubleshooting of client > performance by enabling clients to push metrics to brokers. > > > > I’d like to thank everyone that participated in the discussion, > especially the librdkafka team since one of the aims of the KIP is to > enable any client to participate, not just the Apache Kafka project’s Java > clients. > > > > Thanks, > > Andrew > > >
Re: [VOTE] KIP-714: Client metrics and observability
Bumping the voting thread for KIP-714. So far, we have: Non-binding +2 (Milind and Kirk), non-binding -1 (Ryanne) Thanks, Andrew > On 4 Aug 2023, at 09:45, Andrew Schofield wrote: > > Hi, > After almost 2 1/2 years in the making, I would like to call a vote for > KIP-714 > (https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability). > > This KIP aims to improve monitoring and troubleshooting of client performance > by enabling clients to push metrics to brokers. > > I’d like to thank everyone that participated in the discussion, especially > the librdkafka team since one of the aims of the KIP is to enable any client > to participate, not just the Apache Kafka project’s Java clients. > > Thanks, > Andrew
Re: [VOTE] KIP-714: Client metrics and observability
-1, non-binding, for reasons previously stated. Ryanne On Fri, Aug 4, 2023, 3:46 AM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Hi, > After almost 2 1/2 years in the making, I would like to call a vote for > KIP-714 ( > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > ). > > This KIP aims to improve monitoring and troubleshooting of client > performance by enabling clients to push metrics to brokers. > > I’d like to thank everyone that participated in the discussion, especially > the librdkafka team since one of the aims of the KIP is to enable any > client to participate, not just the Apache Kafka project’s Java clients. > > Thanks, > Andrew
Re: [VOTE] KIP-714: Client metrics and observability
Hi Andrew, +1 (non-binding) This is a huge step in enabling end-to-end observability for users and hopefully even help us get a better idea where we can improvement the client behavior. And +100 re: librdkafka team involvement. Thanks! > On Aug 8, 2023, at 4:00 AM, Milind Luthra > wrote: > > Hi Andrew, thanks for working on the KIP. > > +1 (non binding) > > Thanks, > Milind > > On Fri, Aug 4, 2023 at 2:16 PM Andrew Schofield < > andrew_schofield_j...@outlook.com> wrote: > >> Hi, >> After almost 2 1/2 years in the making, I would like to call a vote for >> KIP-714 ( >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability >> ). >> >> This KIP aims to improve monitoring and troubleshooting of client >> performance by enabling clients to push metrics to brokers. >> >> I’d like to thank everyone that participated in the discussion, especially >> the librdkafka team since one of the aims of the KIP is to enable any >> client to participate, not just the Apache Kafka project’s Java clients. >> >> Thanks, >> Andrew
Re: [VOTE] KIP-714: Client metrics and observability
Hi Andrew, thanks for working on the KIP. +1 (non binding) Thanks, Milind On Fri, Aug 4, 2023 at 2:16 PM Andrew Schofield < andrew_schofield_j...@outlook.com> wrote: > Hi, > After almost 2 1/2 years in the making, I would like to call a vote for > KIP-714 ( > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability > ). > > This KIP aims to improve monitoring and troubleshooting of client > performance by enabling clients to push metrics to brokers. > > I’d like to thank everyone that participated in the discussion, especially > the librdkafka team since one of the aims of the KIP is to enable any > client to participate, not just the Apache Kafka project’s Java clients. > > Thanks, > Andrew
[VOTE] KIP-714: Client metrics and observability
Hi, After almost 2 1/2 years in the making, I would like to call a vote for KIP-714 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability). This KIP aims to improve monitoring and troubleshooting of client performance by enabling clients to push metrics to brokers. I’d like to thank everyone that participated in the discussion, especially the librdkafka team since one of the aims of the KIP is to enable any client to participate, not just the Apache Kafka project’s Java clients. Thanks, Andrew
Re: [VOTE] KIP-714: Client Metrics and Observability
+1 Thanks Magnus! On Tue, May 17, 2022 at 5:43 AM Magnus Edenhill wrote: > Hey all, > > It's that time of year again where we re-restart this vote thread after > some additional > discussions on the disco thread and minor adjustments&clarifications to the > KIP. > > We're currently at +5 (non-binding) and -1 (non-binding) votes. > > Please cast your votes, people. > > > Thanks, > Magnus > > > Den tors 3 mars 2022 kl 15:39 skrev Julien Chanaud < > chanaud.jul...@gmail.com > >: > > > +1 > > As a member of a team which operates several Kafka clusters, I am > > unequipped when it comes to troubleshooting issues with project teams > > that did not understand the importance of configuring client-side > > monitoring. > > Kafka represents a fraction of their work and they don't have enough > > experience, time or interest in trying to understand the meaning behind > > every metric. > > > > I stand 100% behind what Colin stated back in June in the Discuss thread > : > > > > > Magnus and I explained a few times the reasons why it does matter. > Within > > > most organizations, there are usually several teams using clients, > which > > > are separate from the team which maintains the Kafka cluster. The Kafka > > > team has the Kafka experts, which makes it the best place to centralize > > > collecting and analyzing Kafka metrics. > > > > > > Thanks for this KIP. > > > > Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com < > > rifer...@riferrei.com> > > a écrit : > > > > > +1 > > > > > > I think this KIP solves a problem that has been around for some time > with > > > Kafka deployments, which is the ability to assess the current state of > a > > > Kafka architecture but looking at the whole picture. I also share other > > > folks' concerns regarding adding runtime dependencies to the clients; > > this > > > may be problematic for large deployments. Still, I think it is worth > > > refactoring. > > > > > > IMHO, it is a fair trade-off. > > > > > > — Ricardo > > > > > > > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill > > wrote: > > > > > > > > Hi all, > > > > > > > > it's been a while and there's been some more discussions of the KIP > > which > > > > have been > > > > addressed on the KIP page. > > > > > > > > I think it's a good time to revive this vote thread and get things > > > moving. > > > > > > > > We're currently at +3 (non-binding) and -1 (non-binding) votes. > > > > > > > > Regards, > > > > Magnus > > > > > > > > > > > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers : > > > > > > > >> +1 > > > >> > > > >> Thank you for the KIP! > > > >> > > > >> Our organization runs kafka at large scale in a multi-tenant > > > configuration. > > > >> We actually have many other enterprises connecting up to our system > to > > > >> retrieve stream data. These feeds vary greatly in volume and > velocity. > > > The > > > >> peak rates are a multiplicative factor of the nominal. There is > > extreme > > > >> skew in our datasets in a number of ways. > > > >> > > > >> We don't have time to work with every new internal/external client > to > > > tune > > > >> their feeds. They need to be able to take one of the many kafka > > clients > > > and > > > >> go off to the races. > > > >> > > > >> Being able to retrieve client metrics would be invaluable here as > it's > > > hard > > > >> and time consuming to communicate out of the enterprise walls. > > > >> > > > >> This KIP is important to us to expand the use of our datasets > > internally > > > >> and outside the borders of the enterprise. Our clients like the > > > performance > > > >> and data safeties related to the kafka connection. The observability > > has > > > >> been a problem... > > > >> > > > >> Jonathan Rivers > > > >> jrivers...@gmail.com > > > >> > > > >> > > > >> > > > >> > > > >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan < > ryannedo...@gmail.com> > > > >> wrote: > > > >> > > > >>> -1 > > > >>> > > > >>> Ryanne > > > >>> > > > >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill > > > >> wrote: > > > >>> > > > Hi all, > > > > > > I'd like to start a vote on KIP-714. > > > https://cwiki.apache.org/confluence/x/2xRRCg > > > > > > Discussion thread: > > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > > > > > Thanks, > > > Magnus > > > > > > >>> > > > >> > > > > > > > > >
Re: [VOTE] KIP-714: Client Metrics and Observability
Hey all, It's that time of year again where we re-restart this vote thread after some additional discussions on the disco thread and minor adjustments&clarifications to the KIP. We're currently at +5 (non-binding) and -1 (non-binding) votes. Please cast your votes, people. Thanks, Magnus Den tors 3 mars 2022 kl 15:39 skrev Julien Chanaud : > +1 > As a member of a team which operates several Kafka clusters, I am > unequipped when it comes to troubleshooting issues with project teams > that did not understand the importance of configuring client-side > monitoring. > Kafka represents a fraction of their work and they don't have enough > experience, time or interest in trying to understand the meaning behind > every metric. > > I stand 100% behind what Colin stated back in June in the Discuss thread : > > > Magnus and I explained a few times the reasons why it does matter. Within > > most organizations, there are usually several teams using clients, which > > are separate from the team which maintains the Kafka cluster. The Kafka > > team has the Kafka experts, which makes it the best place to centralize > > collecting and analyzing Kafka metrics. > > > Thanks for this KIP. > > Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com < > rifer...@riferrei.com> > a écrit : > > > +1 > > > > I think this KIP solves a problem that has been around for some time with > > Kafka deployments, which is the ability to assess the current state of a > > Kafka architecture but looking at the whole picture. I also share other > > folks' concerns regarding adding runtime dependencies to the clients; > this > > may be problematic for large deployments. Still, I think it is worth > > refactoring. > > > > IMHO, it is a fair trade-off. > > > > — Ricardo > > > > > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill > wrote: > > > > > > Hi all, > > > > > > it's been a while and there's been some more discussions of the KIP > which > > > have been > > > addressed on the KIP page. > > > > > > I think it's a good time to revive this vote thread and get things > > moving. > > > > > > We're currently at +3 (non-binding) and -1 (non-binding) votes. > > > > > > Regards, > > > Magnus > > > > > > > > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers : > > > > > >> +1 > > >> > > >> Thank you for the KIP! > > >> > > >> Our organization runs kafka at large scale in a multi-tenant > > configuration. > > >> We actually have many other enterprises connecting up to our system to > > >> retrieve stream data. These feeds vary greatly in volume and velocity. > > The > > >> peak rates are a multiplicative factor of the nominal. There is > extreme > > >> skew in our datasets in a number of ways. > > >> > > >> We don't have time to work with every new internal/external client to > > tune > > >> their feeds. They need to be able to take one of the many kafka > clients > > and > > >> go off to the races. > > >> > > >> Being able to retrieve client metrics would be invaluable here as it's > > hard > > >> and time consuming to communicate out of the enterprise walls. > > >> > > >> This KIP is important to us to expand the use of our datasets > internally > > >> and outside the borders of the enterprise. Our clients like the > > performance > > >> and data safeties related to the kafka connection. The observability > has > > >> been a problem... > > >> > > >> Jonathan Rivers > > >> jrivers...@gmail.com > > >> > > >> > > >> > > >> > > >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan > > >> wrote: > > >> > > >>> -1 > > >>> > > >>> Ryanne > > >>> > > >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill > > >> wrote: > > >>> > > Hi all, > > > > I'd like to start a vote on KIP-714. > > https://cwiki.apache.org/confluence/x/2xRRCg > > > > Discussion thread: > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > > > Thanks, > > Magnus > > > > >>> > > >> > > > > >
Re: [VOTE] KIP-714: Client Metrics and Observability
+1 As a member of a team which operates several Kafka clusters, I am unequipped when it comes to troubleshooting issues with project teams that did not understand the importance of configuring client-side monitoring. Kafka represents a fraction of their work and they don't have enough experience, time or interest in trying to understand the meaning behind every metric. I stand 100% behind what Colin stated back in June in the Discuss thread : > Magnus and I explained a few times the reasons why it does matter. Within > most organizations, there are usually several teams using clients, which > are separate from the team which maintains the Kafka cluster. The Kafka > team has the Kafka experts, which makes it the best place to centralize > collecting and analyzing Kafka metrics. Thanks for this KIP. Le mer. 26 janv. 2022 à 16:01, rifer...@riferrei.com a écrit : > +1 > > I think this KIP solves a problem that has been around for some time with > Kafka deployments, which is the ability to assess the current state of a > Kafka architecture but looking at the whole picture. I also share other > folks' concerns regarding adding runtime dependencies to the clients; this > may be problematic for large deployments. Still, I think it is worth > refactoring. > > IMHO, it is a fair trade-off. > > — Ricardo > > > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill wrote: > > > > Hi all, > > > > it's been a while and there's been some more discussions of the KIP which > > have been > > addressed on the KIP page. > > > > I think it's a good time to revive this vote thread and get things > moving. > > > > We're currently at +3 (non-binding) and -1 (non-binding) votes. > > > > Regards, > > Magnus > > > > > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers : > > > >> +1 > >> > >> Thank you for the KIP! > >> > >> Our organization runs kafka at large scale in a multi-tenant > configuration. > >> We actually have many other enterprises connecting up to our system to > >> retrieve stream data. These feeds vary greatly in volume and velocity. > The > >> peak rates are a multiplicative factor of the nominal. There is extreme > >> skew in our datasets in a number of ways. > >> > >> We don't have time to work with every new internal/external client to > tune > >> their feeds. They need to be able to take one of the many kafka clients > and > >> go off to the races. > >> > >> Being able to retrieve client metrics would be invaluable here as it's > hard > >> and time consuming to communicate out of the enterprise walls. > >> > >> This KIP is important to us to expand the use of our datasets internally > >> and outside the borders of the enterprise. Our clients like the > performance > >> and data safeties related to the kafka connection. The observability has > >> been a problem... > >> > >> Jonathan Rivers > >> jrivers...@gmail.com > >> > >> > >> > >> > >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan > >> wrote: > >> > >>> -1 > >>> > >>> Ryanne > >>> > >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill > >> wrote: > >>> > Hi all, > > I'd like to start a vote on KIP-714. > https://cwiki.apache.org/confluence/x/2xRRCg > > Discussion thread: > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > Thanks, > Magnus > > >>> > >> > >
Re: [VOTE] KIP-714: Client Metrics and Observability
+1 I think this KIP solves a problem that has been around for some time with Kafka deployments, which is the ability to assess the current state of a Kafka architecture but looking at the whole picture. I also share other folks' concerns regarding adding runtime dependencies to the clients; this may be problematic for large deployments. Still, I think it is worth refactoring. IMHO, it is a fair trade-off. — Ricardo > On Jan 26, 2022, at 9:34 AM, Magnus Edenhill wrote: > > Hi all, > > it's been a while and there's been some more discussions of the KIP which > have been > addressed on the KIP page. > > I think it's a good time to revive this vote thread and get things moving. > > We're currently at +3 (non-binding) and -1 (non-binding) votes. > > Regards, > Magnus > > > Den mån 1 nov. 2021 kl 21:19 skrev J Rivers : > >> +1 >> >> Thank you for the KIP! >> >> Our organization runs kafka at large scale in a multi-tenant configuration. >> We actually have many other enterprises connecting up to our system to >> retrieve stream data. These feeds vary greatly in volume and velocity. The >> peak rates are a multiplicative factor of the nominal. There is extreme >> skew in our datasets in a number of ways. >> >> We don't have time to work with every new internal/external client to tune >> their feeds. They need to be able to take one of the many kafka clients and >> go off to the races. >> >> Being able to retrieve client metrics would be invaluable here as it's hard >> and time consuming to communicate out of the enterprise walls. >> >> This KIP is important to us to expand the use of our datasets internally >> and outside the borders of the enterprise. Our clients like the performance >> and data safeties related to the kafka connection. The observability has >> been a problem... >> >> Jonathan Rivers >> jrivers...@gmail.com >> >> >> >> >> On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan >> wrote: >> >>> -1 >>> >>> Ryanne >>> >>> On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill >> wrote: >>> Hi all, I'd like to start a vote on KIP-714. https://cwiki.apache.org/confluence/x/2xRRCg Discussion thread: https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html Thanks, Magnus >>> >>
Re: [VOTE] KIP-714: Client Metrics and Observability
Hi all, it's been a while and there's been some more discussions of the KIP which have been addressed on the KIP page. I think it's a good time to revive this vote thread and get things moving. We're currently at +3 (non-binding) and -1 (non-binding) votes. Regards, Magnus Den mån 1 nov. 2021 kl 21:19 skrev J Rivers : > +1 > > Thank you for the KIP! > > Our organization runs kafka at large scale in a multi-tenant configuration. > We actually have many other enterprises connecting up to our system to > retrieve stream data. These feeds vary greatly in volume and velocity. The > peak rates are a multiplicative factor of the nominal. There is extreme > skew in our datasets in a number of ways. > > We don't have time to work with every new internal/external client to tune > their feeds. They need to be able to take one of the many kafka clients and > go off to the races. > > Being able to retrieve client metrics would be invaluable here as it's hard > and time consuming to communicate out of the enterprise walls. > > This KIP is important to us to expand the use of our datasets internally > and outside the borders of the enterprise. Our clients like the performance > and data safeties related to the kafka connection. The observability has > been a problem... > > Jonathan Rivers > jrivers...@gmail.com > > > > > On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan > wrote: > > > -1 > > > > Ryanne > > > > On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill > wrote: > > > > > Hi all, > > > > > > I'd like to start a vote on KIP-714. > > > https://cwiki.apache.org/confluence/x/2xRRCg > > > > > > Discussion thread: > > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > > > > > Thanks, > > > Magnus > > > > > >
RE: Re: [VOTE] KIP-714: Client Metrics and Observability
+1 We also have a lot of clients using our central Kafka cluster, and it would be great to have client metrics so we can provide end-to-end monitoring. Igor Buzatović Porsche Digital On 2021/11/01 20:19:20 J Rivers wrote: > +1 > > Thank you for the KIP! > > Our organization runs kafka at large scale in a multi-tenant configuration. > We actually have many other enterprises connecting up to our system to > retrieve stream data. These feeds vary greatly in volume and velocity. The > peak rates are a multiplicative factor of the nominal. There is extreme > skew in our datasets in a number of ways. > > We don't have time to work with every new internal/external client to tune > their feeds. They need to be able to take one of the many kafka clients and > go off to the races. > > Being able to retrieve client metrics would be invaluable here as it's hard > and time consuming to communicate out of the enterprise walls. > > This KIP is important to us to expand the use of our datasets internally > and outside the borders of the enterprise. Our clients like the performance > and data safeties related to the kafka connection. The observability has > been a problem... > > Jonathan Rivers > jrivers...@gmail.com > > > > > On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan wrote: > > > -1 > > > > Ryanne > > > > On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill wrote: > > > > > Hi all, > > > > > > I'd like to start a vote on KIP-714. > > > https://cwiki.apache.org/confluence/x/2xRRCg > > > > > > Discussion thread: > > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > > > > > Thanks, > > > Magnus > > > > > >
Re: [VOTE] KIP-714: Client Metrics and Observability
+1 Thank you for the KIP! Our organization runs kafka at large scale in a multi-tenant configuration. We actually have many other enterprises connecting up to our system to retrieve stream data. These feeds vary greatly in volume and velocity. The peak rates are a multiplicative factor of the nominal. There is extreme skew in our datasets in a number of ways. We don't have time to work with every new internal/external client to tune their feeds. They need to be able to take one of the many kafka clients and go off to the races. Being able to retrieve client metrics would be invaluable here as it's hard and time consuming to communicate out of the enterprise walls. This KIP is important to us to expand the use of our datasets internally and outside the borders of the enterprise. Our clients like the performance and data safeties related to the kafka connection. The observability has been a problem... Jonathan Rivers jrivers...@gmail.com On Mon, Oct 18, 2021 at 11:56 PM Ryanne Dolan wrote: > -1 > > Ryanne > > On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill wrote: > > > Hi all, > > > > I'd like to start a vote on KIP-714. > > https://cwiki.apache.org/confluence/x/2xRRCg > > > > Discussion thread: > > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > > > Thanks, > > Magnus > > >
Re: [VOTE] KIP-714: Client Metrics and Observability
-1 Ryanne On Mon, Oct 18, 2021, 4:30 AM Magnus Edenhill wrote: > Hi all, > > I'd like to start a vote on KIP-714. > https://cwiki.apache.org/confluence/x/2xRRCg > > Discussion thread: > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > Thanks, > Magnus >
Re: [VOTE] KIP-714: Client Metrics and Observability
Hi MagnUs, Thanks for the KIP. +1 (non-binding) Cheers, Anna On Mon, Oct 18, 2021, 5:30 AM Magnus Edenhill wrote: > Hi all, > > I'd like to start a vote on KIP-714. > https://cwiki.apache.org/confluence/x/2xRRCg > > Discussion thread: > https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html > > Thanks, > Magnus >
[VOTE] KIP-714: Client Metrics and Observability
Hi all, I'd like to start a vote on KIP-714. https://cwiki.apache.org/confluence/x/2xRRCg Discussion thread: https://www.mail-archive.com/dev@kafka.apache.org/msg119000.html Thanks, Magnus