Re: [DISCUSS] KIP-714: Client metrics and observability

Tom Bentley Tue, 19 Oct 2021 04:22:43 -0700

Hi Magnus,

I reviewed the KIP since you called the vote (sorry for not reviewing when
you announced your intention to call the vote). I have a few questions on
some of the details.

1. There's no Javadoc on ClientTelemetryPayload.data(), so I don't know
whether the payload is exposed through this method as compressed or not.
Later on you say "Decompression of the payloads will be handled by the
broker metrics plugin, the broker should expose a suitable decompression
API to the metrics plugin for this purpose.", which suggests it's the
compressed data in the buffer, but then we don't know which codec was used,
nor the API via which the plugin should decompress it if required for
forwarding to the ultimate metrics store. Should the ClientTelemetryPayload
expose a method to get the compression and a decompressor?
2. The client-side API is expressed as StringOrError
ClientInstance::ClientInstanceId(int timeout_ms). I understand that you're
thinking about the librdkafka implementation, but it would be good to show
the API as it would appear on the Apache Kafka clients.
3. "PushTelemetryRequest|Response - protocol request used by the client to
send metrics to any broker it is connected to." To be clear, this means
that the client can choose any of the connected brokers and push to just
one of them? What should a supporting client do if it gets an error when
pushing metrics to a broker, retry sending to the same broker or try
pushing to another broker, or drop the metrics? Should supporting clients
send successive requests to a single broker, or round robin, or is that up
to the client author? I'm guessing the behaviour should be sticky to
support the rate limiting features, but I think it would be good for client
authors if this section were explicit on the recommended behaviour.
4. "Mapping the client instance id to an actual application instance
running on a (virtual) machine can be done by inspecting the metrics
resource labels, such as the client source address and source port, or
security principal, all of which are added by the receiving broker. This
will allow the operator together with the user to identify the actual
application instance." Is this really always true? The source IP and port
might be a loadbalancer/proxy in some setups. The principal, as already
mentioned in the KIP, might be shared between multiple applications. So at
worst the organization running the clients might have to consult the logs
of a set of client applications, right?
5. "Tests indicate that a compression ratio up to 10x is possible for the
standard metrics." Client authors might appreciate your mentioning which
compression codec got these results.
6. "Should the client send a push request prior to expiry of the previously
calculated PushIntervalMs the broker will discard the metrics and return a
PushTelemetryResponse with the ErrorCode set to RateLimited." Is this
RATE_LIMITED a new error code? It's not mentioned in the "New Error Codes"
section.
7. In the section "Standard client resource labels" application_id is
described as Kafka Streams only, but the section of "Client Identification"
talks about "application instance id as an optional future nice-to-have
that may be included as a metrics label if it has been set by the user", so
I'm confused whether non-Kafka Streams clients should set an application_id
or not.

Kind regards,

Tom

On Thu, Oct 7, 2021 at 5:26 PM Magnus Edenhill <mag...@edenhill.se> wrote:

> Hi all,
>
> I've updated the KIP following our recent discussions on the mailing list:
>  - split the protocol in two, one for getting the metrics subscriptions,
> and one for pushing the metrics.
>  - simplifications: initially only one supported metrics format, no
> client.id in the instance id, etc.
>  - made CLIENT_METRICS subscription configuration entries more structured
>    and allowing better client matching selectors (not only on the instance
> id, but also the other
>    client resource labels, such as client_software_name, etc.).
>
> Unless there are further comments I'll call the vote in a day or two.
>
> Regards,
> Magnus
>
>
>
> Den mån 4 okt. 2021 kl 20:57 skrev Magnus Edenhill <mag...@edenhill.se>:
>
> > Hi Gwen,
> >
> > I'm finishing up the KIP based on the last couple of discussion points in
> > this thread
> > and will call the Vote later this week.
> >
> > Best,
> > Magnus
> >
> > Den lör 2 okt. 2021 kl 02:01 skrev Gwen Shapira
> <g...@confluent.io.invalid
> > >:
> >
> >> Hey,
> >>
> >> I noticed that there was no discussion for the last 10 days, but I
> >> couldn't
> >> find the vote thread. Is there one that I'm missing?
> >>
> >> Gwen
> >>
> >> On Wed, Sep 22, 2021 at 4:58 AM Magnus Edenhill <mag...@edenhill.se>
> >> wrote:
> >>
> >> > Den tis 21 sep. 2021 kl 06:58 skrev Colin McCabe <cmcc...@apache.org
> >:
> >> >
> >> > > On Mon, Sep 20, 2021, at 17:35, Feng Min wrote:
> >> > > > Thanks Magnus & Colin for the discussion.
> >> > > >
> >> > > > Based on KIP-714's stateless design, Client can pretty much use
> any
> >> > > > connection to any broker to send metrics. We are not associating
> >> > > connection
> >> > > > with client metric state. Is my understanding correct? If yes,
> how
> >> > about
> >> > > > the following two scenarios
> >> > > >
> >> > > > 1) One Client (Client-ID) registers two different client instance
> id
> >> > via
> >> > > > separate registration. Is it permitted? If OK, how to distinguish
> >> them
> >> > > from
> >> > > > the case 2 below.
> >> > > >
> >> > >
> >> > > Hi Feng,
> >> > >
> >> > > My understanding, which Magnus can clarify I guess, is that you
> could
> >> > have
> >> > > something like two Producer instances running with the same
> client.id
> >> > > (perhaps because they're using the same config file, for example).
> >> They
> >> > > could even be in the same process. But they would get separate
> UUIDs.
> >> > >
> >> > > I believe Magnus used the term client to mean "Producer or
> Consumer".
> >> So
> >> > > if you have both a Producer and a Consumer in your application I
> would
> >> > > expect you'd get separate UUIDs for both. Again Magnus can chime in
> >> > here, I
> >> > > guess.
> >> > >
> >> >
> >> > That's correct.
> >> >
> >> >
> >> > >
> >> > > > 2) How about the client restarting? What's the expectation? Should
> >> the
> >> > > > server expect the client to carry a persisted client instance id
> or
> >> > > should
> >> > > > the client be treated as a new instance?
> >> > >
> >> > > The KIP doesn't describe any mechanism for persistence, so I would
> >> assume
> >> > > that when you restart the client you get a new UUID. I agree that it
> >> > would
> >> > > be good to spell this out.
> >> > >
> >> > >
> >> > Right, it will not be persisted since a client instance can't be
> >> restarted.
> >> >
> >> > Will update the KIP to make this clearer.
> >> >
> >> > /Magnus
> >> >
> >>
> >>
> >> --
> >> Gwen Shapira
> >> Engineering Manager | Confluent
> >> 650.450.2760 | @gwenshap
> >> Follow us: Twitter | blog
> >>
> >
>

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to