Re: [DISCUSS] KIP-714: Client metrics and observability

Feng Min Thu, 22 Jul 2021 10:38:14 -0700

On Wed, Jul 21, 2021 at 6:17 PM Colin McCabe <cmcc...@apache.org> wrote:


> On Tue, Jun 29, 2021, at 07:22, Magnus Edenhill wrote:
> > Den tors 17 juni 2021 kl 00:52 skrev Colin McCabe <cmcc...@apache.org>:
> > > A few critiques:
> > >
> > > - As I wrote above, I think this could benefit a lot by being split
> into
> > > several RPCs. A registration RPC, a report RPC, and an unregister RPC
> seem
> > > like logical choices.
> > >
> >
> > Responded to this in your previous mail, but in short I think a single
> > request is sufficient and keeps the implementation complexity / state
> down.
> >
>
> Hi Magnus,
>
> I still suspect that trying to do everything with a single RPC is more
> complex than using multiple RPCs.
>
> Can you go into more detail about how the client learns what metrics it
> should send? This was the purpose of the "registration" step in my scheme
> above.
>
> It seems quite awkward to combine an RPC for reporting metrics with and
> RPC for finding out what metrics are configured to be reported. For
> example, how would you build a tool to check what metrics are configured to
> be reported? Does the tool have to report fake metrics, just because
> there's no other way to get back that information? Seems wrong. (It would
> be a bit like combining createTopics and listTopics for "simplicity")
>

 +1 on separate RPC on metric discovery and metric report. I actually think
it makes complexity/state down compared with single RPC.


>
> > > - I don't think the client should be able to choose its own UUID. This
> > > adds complexity and introduces a chance that clients will choose an ID
> that
> > > is not unique. We already have an ID that the client itself supplies
> > > (clientID) so there is no need to introduce another such ID.
> > >
> >
> > The CLIENT_INSTANCE_ID (which is a combination of the client.id and a
> UUID)
> > is actually generated by the receiving broker on first contact.
> > The need for a new unique semi-random id is outlined in the KIP, but in
> > short; the client.id is not unique, and we need something unique that
> still
> > is prefix-matchable to the client.id so that we can add subscriptions
> > either using prefix-matching of just the client.id (which may match one
> or
> > more client instances), and exact matching which will match a one
> specific
> > client instance.
>
> Hmm... the client id is already sent in every RPC as part of the header.
> It's not necessary to send it again as part of one of the other RPC fields,
> right?
>
> More generally, why does the client instance ID need to be
> prefix-matchable? That seems like an implementation detail of the metrics
> collection system used on the broker side. Maybe someone wants to group by
> things other than client IDs -- perhaps client versions, for instance. By
> the same argument, we should put the client version string in the client
> instance ID, since someone might want to group by that. Or maybe we should
> include the hostname, and the IP, and, and, and.... You see the issue here.
> I think we shouldn't get involved in this kind of decision -- if we just
> pass a UUID, the broker-side software can group it or prefix it however it
> wants internally.
>
> > > - In general the schema seems to have a bad case of string-itis. UUID,
> > > content type, and requested metrics are all strings. Since these
> messages
> > > will be sent very frequently, it's quite costly to use strings for all
> > > these things. We have a type for UUID, which uses 16 bytes -- let's use
> > > that type for client instance ID, rather than a string which will be
> much
> > > larger. Also, since we already send clientID in the message header,
> there
> > > is no need to include it again in the instance ID.
> > >
> >
> > As explained above we need the client.id in the CLIENT_INSTANCE_ID. And
> I
> > don't think the overhead of this one string per request is going to be
> much
> > of an issue,
> > typical metric push intervals are probably in the >60s range.
> > If this becomes a problem we could use a per-connection identifier that
> the
> > broker translates to the client instance id before pushing metrics
> upwards
> > in the system.
> >
>
> This is actually an interesting design question -- why not use a
> per-TCP-connection identifier, rather than a per-client-instance
> identifier? If we are grouping by other things anyway (clientID, principal,
> etc.) on the server side, do we need to maintain a per-process identifier
> rather than a per-connection one?
>
> >
> > > - I think it would also be nice to have an enum or something for
> > > AcceptedContentTypes, RequestedMetrics, etc. We know that new
> additions to
> > > these categories will require KIPs, so it should be straightforward
> for the
> > > project to just have an enum that allows us to communicate these as
> ints.
> > >
> >
> > I'm thinking this might be overly constraining. The broker doesn't parse
> or
> > handle the received metrics data itself but just pushes it to the metrics
> > plugin, using an enum would require a KIP and broker upgrade if the
> metrics plugin
> > supports a newer version of OTLP.
> > It is probably better if we don't strictly control the metric format
> itself.
> >
>
> Unfortunately, we have to strictly control the metrics format, because
> otherwise clients can't implement it. I agree that we don't need to specify
> how the broker-side code works, since that is pluggable. It's also
> reasonable for the clients to have pluggable extensions as well, but this
> KIP won't be of much use if we don't at least define a basic set of metrics
> that most clients can understand how to send. The open source clients will
> not implement anything more than what is specified in the KIP (or at least
> the AK one won't...)
>
> >
> >
> > > - Can you talk about whether you are adding any new library
> dependencies
> > > to the Kafka client? It seems like you'd want to add opencensus /
> > > opentelemetry, if we are using that format here.
> > >
> >
> > Yeah, as we get closer to concensus more implementation specific details
> > will be added to the KIP.
> >
>
> I'm not sure if OpenCensus adds any value to this KIP, to be honest. Their
> primary focus was never on the format of the data being sent (in fact, the
> last time they checked, they left the format up to each OpenCensus
> implementation). That may have changed, but I think it still has limited
> usefulness to us, since we have our own format which we have to use anyway.
>
> >
> > >
> > > - Standard client resource labels: can we send these only in the
> > > registration RPC?
> > >
> >
> > These labels are part of the serialized OTLP data, which means it would
> > need to be unpacked and repacked (including compression) by the broker
> (or
> > metrics plugin), which I believe is more costly than sending them for
> each request.
> >
>
> Hmm, that data is about 10 fields, most of which are strings. It certainly
> adds a lot of overhead to resend it each time.
>
> I don't follow the comment about unpacking and repacking -- since the
> client registered with the broker it already knows all this information, so
> there's nothing to unpack or repack, except from memory. If it's more
> convenient to serialize it once rather than multiple times, that is an
> implementation detail of the broker side plugin, which we are not
> specifying here anyway.
>
> best,
> Colin
>
> > Thanks,
> > Magnus
> >
> > >
> > >
> >
>


-- 
Best,
Feng

Re: [DISCUSS] KIP-714: Client metrics and observability

Reply via email to