Re: [DISCUSS] KIP-714: Client metrics and observability

Colin McCabe Fri, 18 Jun 2021 10:37:28 -0700

On Thu, Jun 17, 2021, at 12:13, Ryanne Dolan wrote:
> Colin,
> 
> > lack of support for collecting client metrics
> 
> ...but kafka is not a metrics collector. There are lots of things kafka
> doesn't support. Should it also collect clients' logs for the same reasons?
> What other side channels should it proxy through brokers?
>


Hi Ryanne,

Kafka already is a metrics collector. 

Take a look at KIP-511: "Collect and Expose Client's Name and Version in the 
Brokers," which aggregates metrics from various clients and re-exposes it as a 
broker metric. Or KIP-607: "Add Metrics to Kafka Streams to Report Properties 
of RocksDB" which aggregates metrics from the local RocksDB instances and 
re-exposes them. Or KIP-608 - "Expose Kafka Metrics in Authorizer". Or lots of 
other KIPs.

This has been the direction we've been moving for a while. It's a direction 
motivated by our experiences in the field with users, who find it cumbersome to 
set up dedicated infra to monitor individual Kafka clients. Magnus, especially, 
has a huge amount of experience here.

>
> > He mentioned the fact that configuring client metrics usually involves
> > setting up a separate metrics collection infrastructure.
> 
> This is not changed with the KIP. It's just a matter of who owns that
> infra, which I don't think should matter to Apache Kafka.
> 

Magnus and I explained a few times the reasons why it does matter. Within most 
organizations, there are usually several teams using clients, which are 
separate from the team which maintains the Kafka cluster. The Kafka team has 
the Kafka experts, which makes it the best place to centralize collecting and 
analyzing Kafka metrics.

In a sense the whole concept of cloud computing is "just a matter of who owns 
infra." It is quite important to users.

> We already have MetricsReporter. I still don't see specific motivation
> beyond the "opt-out" part?
> 
> I think we need exceptional motivation for such a proposal.
> 

 As I've said earlier, if you are happy with the current metrics setup, then 
you can continue using it -- nothing in this KIP means you have to change what 
you're doing.

best,
Colin


> On Thu, Jun 17, 2021, 1:43 PM Colin McCabe <cmcc...@apache.org> wrote:
> 
> > Hi Ryan,
> >
> > These are not "arguments for observability in general" but descriptions of
> > specific issues that come up due to Kafka's lack of support for collecting
> > client metrics. He mentioned the fact that configuring client metrics
> > usually involves setting up a separate metrics collection infrastructure.
> > Even if this is easy and straightforward to do (which is not the case for
> > most organizations), it still requires reconfiguring and restarting the
> > application, which is disruptive. Correlating client metrics with server
> > metrics is also often hard. These issues are all mitigated by centralizing
> > metrics collection on the broker.
> >
> > best,
> > Colin
> >
> >
> > On Wed, Jun 16, 2021, at 19:03, Ryanne Dolan wrote:
> > > Magnus, I think these are arguments for observability in general, but not
> > > why kafka should sit between a client and a metics collector.
> > >
> > > Ryanne
> > >
> > > On Wed, Jun 16, 2021, 10:27 AM Magnus Edenhill <mag...@edenhill.se>
> > wrote:
> > >
> > > > Hi Ryanne,
> > > >
> > > > this proposal stems from a need to improve troubleshooting Kafka
> > issues.
> > > >
> > > > As it currently stands, when an application team is experiencing Kafka
> > > > service degradation,
> > > > or the Kafka operator is seeing misbehaving clients, there are plenty
> > of
> > > > steps that needs
> > > > to be taken before any client-side metrics can be observed at all, if
> > at
> > > > all:
> > > >  - Is the application even collecting client metrics? If not it needs
> > to be
> > > > reconfigured or implemented, and restarted;
> > > >    a restart may have business impact, and may also temporarily?
> > remedy the
> > > > problem without giving any further insight
> > > >    into what was wrong.
> > > >  - Are the desired metrics collected? Where are they stored? For how
> > long?
> > > > Is there enough correlating information
> > > >    to map it to cluster-side metrics and events? Does the application
> > > > on-call know how to find the collected metrics?
> > > >  - Export and send these metrics to whoever knows how to interpret
> > them. In
> > > > what format? Are all relevant metadata fields
> > > >    provided?
> > > >
> > > > The KIP aims to solve all these obstacles by giving the Kafka operator
> > the
> > > > tools to collect this information.
> > > >
> > > > Regards,
> > > > Magnus
> > > >
> > > >
> > > > Den tis 15 juni 2021 kl 02:37 skrev Ryanne Dolan <
> > ryannedo...@gmail.com>:
> > > >
> > > > > Magnus, I think such a substantial change requires more motivation
> > than
> > > > is
> > > > > currently provided. As I read it, the motivation boils down to this:
> > you
> > > > > want your clients to phone-home unless they opt-out. As stated in the
> > > > KIP,
> > > > > "there are plenty of existing solutions [...] to send metrics [...]
> > to a
> > > > > collector", so the opt-out appears to be the only motivation. Am I
> > > > missing
> > > > > something?
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Wed, Jun 2, 2021 at 7:46 AM Magnus Edenhill <mag...@edenhill.se>
> > > > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > I'm proposing KIP-714 to add remote Client metrics and
> > observability.
> > > > > > This functionality will allow centralized monitoring and
> > > > troubleshooting
> > > > > of
> > > > > > clients and their internals.
> > > > > >
> > > > > > Please see
> > > > > >
> > > > > >
> > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
> > > > > >
> > > > > > Looking forward to your feedback!
> > > > > >
> > > > > > Regards,
> > > > > > Magnus
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to