Re: [DISCUSS] KIP-714: Client metrics and observability

Mickael Maison Wed, 10 Nov 2021 06:21:28 -0800

Hi Magnus,

I see you've addressed some of the points I raised above but some (4,
5) have not been addressed yet.


I'm really uneasy with this being enabled by default on the client
side. When collecting data, I think the best practice is to ensure
users are explicitly enabling it. You mentioned brokers already have
some(most?) of the information contained in metrics, if so then why
are we collecting it again? Surely there must be some new information
in the client metrics.

Moreover this is a brand new feature so it's even harder to justify
enabling it and forcing onto all our users. If disabled by default,
it's relatively easy to enable in a new release if we decide to, but
once enabled by default it's much harder to disable. Also this feature
will apply to all future metrics we will add.

Overall I think it's an interesting feature but I'd prefer to be
slightly defensive and see how it works in practice before enabling it
everywhere.

Thanks,
Mickael

On Mon, Nov 8, 2021 at 7:55 AM Magnus Edenhill <[email protected]> wrote:
>
> Thanks David for pointing this out,
> I've updated the KIP to include client_id as a matching selector.
>
> Regards,
> Magnus
>
> Den tors 4 nov. 2021 kl 18:01 skrev David Mao <[email protected]>:
>
> > Hey Magnus,
> >
> > I noticed that the KIP outlines the initial selectors supported as:
> >
> >    - client_instance_id - CLIENT_INSTANCE_ID UUID string representation.
> >    - client_software_name  - client software implementation name.
> >    - client_software_version  - client software implementation version.
> >
> > In the given reactive monitoring workflow, we mention that the application
> > user does not know their client's client instance ID, but it's outlined
> > that the operator can add a metrics subscription selecting for clientId. I
> > don't see clientId as one of the supported selectors.
> > I can see how this would have made sense in a previous iteration given that
> > the previous client instance ID proposal was to construct the client
> > instance ID using clientId as a prefix. Now that the client instance ID is
> > a UUID, would we want to add clientId as a supported selector?
> > Let me know what you think.
> >
> > David
> >
> > On Tue, Oct 19, 2021 at 12:33 PM Magnus Edenhill <[email protected]>
> > wrote:
> >
> > > Hi Mickael!
> > >
> > > Den tis 19 okt. 2021 kl 19:30 skrev Mickael Maison <
> > > [email protected]
> > > >:
> > >
> > > > Hi Magnus,
> > > >
> > > > Thanks for the proposal.
> > > >
> > > > 1. Looking at the protocol section, isn't "ClientInstanceId" expected
> > > > to be a field in GetTelemetrySubscriptionsResponseV0? Otherwise, how
> > > > does a client retrieve this value?
> > > >
> > >
> > > Good catch, it got removed by mistake in one of the edits.
> > >
> > >
> > > >
> > > > 2. In the client API section, you mention a new method
> > > > "clientInstanceId()". Can you clarify which interfaces are affected?
> > > > Is it only Consumer and Producer?
> > > >
> > >
> > > And Admin. Will update the KIP.
> > >
> > >
> > >
> > > > 3. I'm a bit concerned this is enabled by default. Even if the data
> > > > collected is supposed to be not sensitive, I think this can be
> > > > problematic in some environments. Also users don't seem to have the
> > > > choice to only expose some metrics. Knowing how much data transit
> > > > through some applications can be considered critical.
> > > >
> > >
> > > The broker already knows how much data transits through the client
> > though,
> > > right?
> > > Care has been taken not to expose information in the standard metrics
> > that
> > > might
> > > reveal sensitive information.
> > >
> > > Do you have an example of how the proposed metrics could leak sensitive
> > > information?
> > > As for limiting the what metrics to export; I guess that could make sense
> > > in some
> > > very sensitive use-cases, but those users might disable metrics
> > altogether
> > > for now.
> > > Could these concerns be addressed by a later KIP?
> > >
> > >
> > >
> > > >
> > > > 4. As a user, how do you know if your application is actively sending
> > > > metrics? Are there new metrics exposing what's going on, like how much
> > > > data is being sent?
> > > >
> > >
> > > That's a good question.
> > > Since the proposed metrics interface is not aimed at, or directly
> > available
> > > to, the application
> > > I guess there's little point of adding it here, but instead adding
> > > something to the
> > > existing JMX metrics?
> > > Do you have any suggestions?
> > >
> > >
> > >
> > > > 5. If all metrics are enabled on a regular Consumer or Producer, do
> > > > you have an idea how much throughput this would use?
> > > >
> > >
> > > It depends on the number of partition/topics/etc the client is producing
> > > to/consuming from.
> > > I'll add some sizes to the KIP for some typical use-cases.
> > >
> > >
> > > Thanks,
> > > Magnus
> > >
> > >
> > > > Thanks
> > > >
> > > > On Tue, Oct 19, 2021 at 5:06 PM Magnus Edenhill <[email protected]>
> > > > wrote:
> > > > >
> > > > > Den tis 19 okt. 2021 kl 13:22 skrev Tom Bentley <[email protected]
> > >:
> > > > >
> > > > > > Hi Magnus,
> > > > > >
> > > > > > I reviewed the KIP since you called the vote (sorry for not
> > reviewing
> > > > when
> > > > > > you announced your intention to call the vote). I have a few
> > > questions
> > > > on
> > > > > > some of the details.
> > > > > >
> > > > > > 1. There's no Javadoc on ClientTelemetryPayload.data(), so I don't
> > > know
> > > > > > whether the payload is exposed through this method as compressed or
> > > > not.
> > > > > > Later on you say "Decompression of the payloads will be handled by
> > > the
> > > > > > broker metrics plugin, the broker should expose a suitable
> > > > decompression
> > > > > > API to the metrics plugin for this purpose.", which suggests it's
> > the
> > > > > > compressed data in the buffer, but then we don't know which codec
> > was
> > > > used,
> > > > > > nor the API via which the plugin should decompress it if required
> > for
> > > > > > forwarding to the ultimate metrics store. Should the
> > > > ClientTelemetryPayload
> > > > > > expose a method to get the compression and a decompressor?
> > > > > >
> > > > >
> > > > > Good point, updated.
> > > > >
> > > > >
> > > > >
> > > > > > 2. The client-side API is expressed as StringOrError
> > > > > > ClientInstance::ClientInstanceId(int timeout_ms). I understand that
> > > > you're
> > > > > > thinking about the librdkafka implementation, but it would be good
> > to
> > > > show
> > > > > > the API as it would appear on the Apache Kafka clients.
> > > > > >
> > > > >
> > > > > This was meant as pseudo-code, but I changed it to Java.
> > > > >
> > > > >
> > > > > > 3. "PushTelemetryRequest|Response - protocol request used by the
> > > > client to
> > > > > > send metrics to any broker it is connected to." To be clear, this
> > > means
> > > > > > that the client can choose any of the connected brokers and push to
> > > > just
> > > > > > one of them? What should a supporting client do if it gets an error
> > > > when
> > > > > > pushing metrics to a broker, retry sending to the same broker or
> > try
> > > > > > pushing to another broker, or drop the metrics? Should supporting
> > > > clients
> > > > > > send successive requests to a single broker, or round robin, or is
> > > > that up
> > > > > > to the client author? I'm guessing the behaviour should be sticky
> > to
> > > > > > support the rate limiting features, but I think it would be good
> > for
> > > > client
> > > > > > authors if this section were explicit on the recommended behaviour.
> > > > > >
> > > > >
> > > > > You are right, I've updated the KIP to make this clearer.
> > > > >
> > > > >
> > > > > > 4. "Mapping the client instance id to an actual application
> > instance
> > > > > > running on a (virtual) machine can be done by inspecting the
> > metrics
> > > > > > resource labels, such as the client source address and source port,
> > > or
> > > > > > security principal, all of which are added by the receiving broker.
> > > > This
> > > > > > will allow the operator together with the user to identify the
> > actual
> > > > > > application instance." Is this really always true? The source IP
> > and
> > > > port
> > > > > > might be a loadbalancer/proxy in some setups. The principal, as
> > > already
> > > > > > mentioned in the KIP, might be shared between multiple
> > applications.
> > > > So at
> > > > > > worst the organization running the clients might have to consult
> > the
> > > > logs
> > > > > > of a set of client applications, right?
> > > > > >
> > > > >
> > > > > Yes, that's correct. There's no guaranteed mapping from
> > > > client_instance_id
> > > > > to
> > > > > an actual instance, that's why the KIP recommends client
> > > implementations
> > > > to
> > > > > log the client instance id
> > > > > upon retrieval, and also provide an API for the application to
> > retrieve
> > > > the
> > > > > instance id programmatically
> > > > > if it has a better way of exposing it.
> > > > >
> > > > >
> > > > > 5. "Tests indicate that a compression ratio up to 10x is possible for
> > > the
> > > > > > standard metrics." Client authors might appreciate your mentioning
> > > > which
> > > > > > compression codec got these results.
> > > > > >
> > > > >
> > > > > Good point. Updated.
> > > > >
> > > > >
> > > > > > 6. "Should the client send a push request prior to expiry of the
> > > > previously
> > > > > > calculated PushIntervalMs the broker will discard the metrics and
> > > > return a
> > > > > > PushTelemetryResponse with the ErrorCode set to RateLimited." Is
> > this
> > > > > > RATE_LIMITED a new error code? It's not mentioned in the "New Error
> > > > Codes"
> > > > > > section.
> > > > > >
> > > > >
> > > > > That's a leftover, it should be using the standard ThrottleTime
> > > > mechanism.
> > > > > Fixed.
> > > > >
> > > > >
> > > > > > 7. In the section "Standard client resource labels" application_id
> > is
> > > > > > described as Kafka Streams only, but the section of "Client
> > > > Identification"
> > > > > > talks about "application instance id as an optional future
> > > nice-to-have
> > > > > > that may be included as a metrics label if it has been set by the
> > > > user", so
> > > > > > I'm confused whether non-Kafka Streams clients should set an
> > > > application_id
> > > > > > or not.
> > > > > >
> > > > >
> > > > > I'll clarify this in the KIP, but basically we would need to add an `
> > > > > application.id` config
> > > > > property for non-streams clients for this purpose, and that's outside
> > > the
> > > > > scope of this KIP since we want to make it zero-conf:ish on the
> > client
> > > > side.
> > > > >
> > > > >
> > > > > >
> > > > > > Kind regards,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > >
> > > > > Thanks for the review,
> > > > > Magnus
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > On Thu, Oct 7, 2021 at 5:26 PM Magnus Edenhill <[email protected]
> > >
> > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I've updated the KIP following our recent discussions on the
> > > mailing
> > > > > > list:
> > > > > > >  - split the protocol in two, one for getting the metrics
> > > > subscriptions,
> > > > > > > and one for pushing the metrics.
> > > > > > >  - simplifications: initially only one supported metrics format,
> > no
> > > > > > > client.id in the instance id, etc.
> > > > > > >  - made CLIENT_METRICS subscription configuration entries more
> > > > structured
> > > > > > >    and allowing better client matching selectors (not only on the
> > > > > > instance
> > > > > > > id, but also the other
> > > > > > >    client resource labels, such as client_software_name, etc.).
> > > > > > >
> > > > > > > Unless there are further comments I'll call the vote in a day or
> > > two.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Magnus
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Den mån 4 okt. 2021 kl 20:57 skrev Magnus Edenhill <
> > > > [email protected]>:
> > > > > > >
> > > > > > > > Hi Gwen,
> > > > > > > >
> > > > > > > > I'm finishing up the KIP based on the last couple of discussion
> > > > points
> > > > > > in
> > > > > > > > this thread
> > > > > > > > and will call the Vote later this week.
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Magnus
> > > > > > > >
> > > > > > > > Den lör 2 okt. 2021 kl 02:01 skrev Gwen Shapira
> > > > > > > <[email protected]
> > > > > > > > >:
> > > > > > > >
> > > > > > > >> Hey,
> > > > > > > >>
> > > > > > > >> I noticed that there was no discussion for the last 10 days,
> > > but I
> > > > > > > >> couldn't
> > > > > > > >> find the vote thread. Is there one that I'm missing?
> > > > > > > >>
> > > > > > > >> Gwen
> > > > > > > >>
> > > > > > > >> On Wed, Sep 22, 2021 at 4:58 AM Magnus Edenhill <
> > > > [email protected]>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Den tis 21 sep. 2021 kl 06:58 skrev Colin McCabe <
> > > > > > [email protected]
> > > > > > > >:
> > > > > > > >> >
> > > > > > > >> > > On Mon, Sep 20, 2021, at 17:35, Feng Min wrote:
> > > > > > > >> > > > Thanks Magnus & Colin for the discussion.
> > > > > > > >> > > >
> > > > > > > >> > > > Based on KIP-714's stateless design, Client can pretty
> > > much
> > > > use
> > > > > > > any
> > > > > > > >> > > > connection to any broker to send metrics. We are not
> > > > associating
> > > > > > > >> > > connection
> > > > > > > >> > > > with client metric state. Is my understanding correct?
> > If
> > > > yes,
> > > > > > > how
> > > > > > > >> > about
> > > > > > > >> > > > the following two scenarios
> > > > > > > >> > > >
> > > > > > > >> > > > 1) One Client (Client-ID) registers two different client
> > > > > > instance
> > > > > > > id
> > > > > > > >> > via
> > > > > > > >> > > > separate registration. Is it permitted? If OK, how to
> > > > > > distinguish
> > > > > > > >> them
> > > > > > > >> > > from
> > > > > > > >> > > > the case 2 below.
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > > Hi Feng,
> > > > > > > >> > >
> > > > > > > >> > > My understanding, which Magnus can clarify I guess, is
> > that
> > > > you
> > > > > > > could
> > > > > > > >> > have
> > > > > > > >> > > something like two Producer instances running with the
> > same
> > > > > > > client.id
> > > > > > > >> > > (perhaps because they're using the same config file, for
> > > > example).
> > > > > > > >> They
> > > > > > > >> > > could even be in the same process. But they would get
> > > separate
> > > > > > > UUIDs.
> > > > > > > >> > >
> > > > > > > >> > > I believe Magnus used the term client to mean "Producer or
> > > > > > > Consumer".
> > > > > > > >> So
> > > > > > > >> > > if you have both a Producer and a Consumer in your
> > > > application I
> > > > > > > would
> > > > > > > >> > > expect you'd get separate UUIDs for both. Again Magnus can
> > > > chime
> > > > > > in
> > > > > > > >> > here, I
> > > > > > > >> > > guess.
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> > That's correct.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > >
> > > > > > > >> > > > 2) How about the client restarting? What's the
> > > expectation?
> > > > > > Should
> > > > > > > >> the
> > > > > > > >> > > > server expect the client to carry a persisted client
> > > > instance id
> > > > > > > or
> > > > > > > >> > > should
> > > > > > > >> > > > the client be treated as a new instance?
> > > > > > > >> > >
> > > > > > > >> > > The KIP doesn't describe any mechanism for persistence,
> > so I
> > > > would
> > > > > > > >> assume
> > > > > > > >> > > that when you restart the client you get a new UUID. I
> > agree
> > > > that
> > > > > > it
> > > > > > > >> > would
> > > > > > > >> > > be good to spell this out.
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > Right, it will not be persisted since a client instance
> > can't
> > > be
> > > > > > > >> restarted.
> > > > > > > >> >
> > > > > > > >> > Will update the KIP to make this clearer.
> > > > > > > >> >
> > > > > > > >> > /Magnus
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Gwen Shapira
> > > > > > > >> Engineering Manager | Confluent
> > > > > > > >> 650.450.2760 | @gwenshap
> > > > > > > >> Follow us: Twitter | blog
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to