Thanks Sophie and Jun.

`clientInstanceIds()` is fine with me -- was not sure about the double plural myself.

Sorry if my comments was confusing. I was trying to say, that adding a overload to `KafkaStreams` that does not take a timeout parameter does not make sense, because there is no `default.api.timeout.ms` config for Kafka Streams, so users always need to pass in a timeout. (Same for producer.)

For the implementation, I think KS would always call `client.clientInstanceId(timeout)` and never rely on `default.api.timeout.ms` though, so we can stay in control -- if a timeout is passed by the user, it would always overwrite `default.api.timeout.ms` on the consumer/admin and thus we should follow the same semantics in Kafka Streams, and overwrite it explicitly when calling `client.clientInstanceId()`.

The proposed API also makes sense to me. I was just wondering if we want to extend it for client users -- for KS we won't need/use the timeout-less overloads.



130) My intent was to throw a TimeoutException if we cannot get all instanceIds, because it's the standard contract for timeouts. It would also be hard to tell for a user, if a full or partial result was returned (or we add a method `boolean isPartialResult()` to make it easier for users).

If there is concerns/objections, I am also ok to return a partial result -- it would require a change to the newly added `ClientInstanceIds` return type -- for `adminInstanceId` we only return a `String` right now -- we might need to change this to `Optional<String>` so we are able to return a partial result?


131) Of course we could, but I am not sure what we would gain? In the end, implementation details would always leak because if we change the number of consumer we use, we would return different keys in the `Map`. Atm, the proposal implies that the same key might be used for the "main" and "restore" consumer of the same thread -- but we can make keys unique by adding a `-restore` suffix to the restore-consumer key if we merge both maps. -- Curious to hear what others think. I am very open to do it differently than currently proposed.


-Matthias


On 10/12/23 8:39 AM, Jun Rao wrote:
Hi, Matthias,

Thanks for the reply.

130. What would be the semantic? If the timeout has expired and only some
of the client instances' id have been retrieved, does the call return the
partial result or throw an exception?

131. Could we group all consumer instances in a single method since we are
returning the key for each instance already? This probably also avoids
exposing implementation details that could change over time.

Thanks,

Jun

On Thu, Oct 12, 2023 at 12:00 AM Sophie Blee-Goldman <sop...@responsive.dev>
wrote:

Regarding the naming, I personally think `clientInstanceId` makes sense for
the plain clients
  -- especially if we might later introduce the notion of an
`applicationInstanceId`.

I'm not a huge fan of `clientsInstanceIds` for the Kafka Streams API,
though, can we use
`clientInstanceIds` instead? (The difference being the placement of the
plural 's')
I would similarly rename the class to just ClientInstanceIds

we can also not have a timeout-less overload,  because `KafkaStreams` does
not have a `default.api.timeout.ms` config either

With respect to the timeout for the Kafka Streams API, I'm a bit confused
by the
doubletriple-negative of Matthias' comment here, but I was thinking about
this
earlier and this was my take: with the current proposal, we would allow
users to pass
in an absolute timeout as a parameter that would apply to the method as a
whole.
Meanwhile within the method we would issue separate calls to each of the
clients using
the default or user-configured value of their  `default.api.timeout.ms` as
the timeout
parameter.

So the API as proposed makes sense to me.


On Wed, Oct 11, 2023 at 6:48 PM Matthias J. Sax <mj...@apache.org> wrote:

In can answer 130 and 131.

130) We cannot guarantee that all clients are already initialized due to
race conditions. We plan to not allow calling
`KafkaStreams#clientsInstanceIds()` when the state is not RUNNING (or
REBALANCING) though -- guess this slipped on the KIP and should be
added? But because StreamThreads can be added dynamically (and producer
might be created dynamically at runtime; cf below), we still cannot
guarantee that all clients are already initialized when the method is
called. Of course, we assume that all clients are most likely initialize
on the happy path, and blocking calls to `client.clientInstanceId()`
should be rare.

To address the worst case, we won't do a naive implementation and just
loop over all clients, but fan-out the call to the different
StreamThreads (and GlobalStreamThread if it exists), and use Futures to
gather the results.

Currently, `StreamThreads` has 3 clients (if ALOS or EOSv2 is used), so
we might do 3 blocking calls in the worst case (for EOSv1 we get a
producer per tasks, and we might end up doing more blocking calls if the
producers are not initialized yet). Note that EOSv1 is already
deprecated, and we are also working on thread refactoring that will
reduce the number of client on StreamThread to 2 -- and we have more
refactoring planned to reduce the number of clients even further.

Inside `KafakStreams#clientsInstanceIds()` we might only do single
blocking call for the admin client (ie, `admin.clientInstanceId()`).

I agree that we need to do some clever timeout management, but it seems
to be more of an implementation detail?

Do you have any particular concerns, or does the proposed implementation
as sketched above address your question?


130) If the Topology does not have a global-state-store, there won't be
a GlobalThread and thus not global consumer. Thus, we return an Optional.



On three related question for Andrew.

(1) Why is the method called `clientInstanceId()` and not just plain
`instanceId()`?

(2) Why so we return a `String` while but not a UUID type? The added
protocol request/response classes use UUIDs.

(3) Would it make sense to have an overloaded `clientInstanceId()`
method that does not take any parameter but uses `default.api.timeout`
config (this config does no exist on the producer though, so we could
only have it for consumer and admin at this point). We could of course
also add overloads like this later if user request them (and/or add
`default.api.timeout.ms` to the producer, too).

Btw: For KafkaStreams, I think `clientsInstanceIds` still makes sense as
a method name though, as `KafkaStreams` itself does not have an
`instanceId` -- we can also not have a timeout-less overload, because
`KafkaStreams` does not have a `default.api.timeout.ms` config either
(and I don't think it make sense to add).



-Matthias

On 10/11/23 5:07 PM, Jun Rao wrote:
Hi, Andrew,

Thanks for the updated KIP. Just a few more minor comments.

130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for
all
consumer/producer/adminClient instances to be initialized? Are all
those
instances created during KafkaStreams initialization?

131. Why does globalConsumerInstanceId() return Optional<String> while
other consumer instances don't return Optional?

132. ClientMetricsSubscriptionRequestCount: Do we need this since we
have a
set of generic metrics
(kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that
report Request rate for every request type?

Thanks,

Jun

On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax <mj...@apache.org>
wrote:

Thanks!

On 10/10/23 11:31 PM, Andrew Schofield wrote:
Matthias,
Yes, I think that’s a sensible way forward and the interface you
propose
looks good. I’ll update the KIP accordingly.

Thanks,
Andrew

On 10 Oct 2023, at 23:01, Matthias J. Sax <mj...@apache.org> wrote:

Andrew,

yes I would like to get this change into KIP-714 right way. Seems to
be
important, as we don't know if/when a follow-up KIP for Kafka Streams
would
land.

I was also thinking (and discussed with a few others) how to expose
it,
and we would propose the following:

We add a new method to `KafkaStreams` class:

      public ClientsInstanceIds clientsInstanceIds(Duration timeout);

The returned object is like below:

    public class ClientsInstanceIds {
      // we only have a single admin client per KS instance
      String adminInstanceId();

      // we only have a single global consumer per KS instance (if
any)
      // Optional<> because we might not have global-thread
      Optional<String> globalConsumerInstanceId();

      // return a <threadKey -> ClientInstanceId> mapping
      // for the underlying (restore-)consumers/producers
      Map<String, String> mainConsumerInstanceIds();
      Map<String, String> restoreConsumerInstanceIds();
      Map<String, String> producerInstanceIds();
}

For the `threadKey`, we would use some pattern like this:

    [Stream|StateUpdater]Thread-<threadIdx>


Would this work from your POV?



-Matthias


On 10/9/23 2:15 AM, Andrew Schofield wrote:
Hi Matthias,
Good point. Makes sense to me.
Is this something that can also be included in the proposed Kafka
Streams follow-on KIP, or would you prefer that I add it to KIP-714?
I have a slight preference for the former to put all of the KS
enhancements into a separate KIP.
Thanks,
Andrew
On 7 Oct 2023, at 02:12, Matthias J. Sax <mj...@apache.org>
wrote:

Thanks Andrew. SGTM.

One point you did not address is the idea to add a method to
`KafkaStreams` similar to the proposed `clientInstanceId()` that will
be
added to consumer/producer/admin clients.

Without addressing this, Kafka Streams users won't have a way to
get
the assigned `instanceId` of the internally created clients, and thus
it
would be very difficult for them to know which metrics that the broker
receives belong to a Kafka Streams app. It seems they would only find
the
`instanceIds` in the log4j output if they enable client logging?

Of course, because there is multiple clients inside Kafka Streams,
the return type cannot be an single "String", but must be some some
complex
data structure -- we could either add a new class, or return a
Map<String,String> using a client key that maps to the `instanceId`.

For example we could use the following key:


  [Global]StreamThread[-<threadIndex>][-restore][consumer|producer]

(Of course, only the valid combination.)

Or maybe even better, we might want to return a `Future` because
collection all the `instanceId` might be a blocking all on each
client?
I
have already a few idea how it could be implemented but I don't think
it
must be discussed on the KIP, as it's an implementation detail.

Thoughts?


-Matthias

On 10/6/23 4:21 AM, Andrew Schofield wrote:
Hi Matthias,
Thanks for your comments. I agree that a follow-up KIP for Kafka
Streams makes sense. This KIP currently has made a bit
of an effort to embrace KS, but it’s not enough by a long way.
I have removed `application.id <http://application.id/>`. This
should be done properly in the follow-up KIP. I don’t believe there’s
a
downside to
removing it from this KIP.
I have reworded the statement about temporarily. In practice, the
implementation of this KIP that’s going on while the voting
progresses happens to use delta temporality, but that’s an
implementation detail. Supporting clients must support both
temporalities.
I thought about exposing the client instance ID as a metric, but
non-numeric metrics are not usual practice and tools
do not universally support them. I don’t think the KIP is
improved
by adding one now.
I have also added constants for the various Config classes for
ENABLE_METRICS_PUSH_CONFIG, including to
StreamsConfig. It’s best to be explicit about this.
Thanks,
Andrew
On 2 Oct 2023, at 23:47, Matthias J. Sax <mj...@apache.org>
wrote:

Hi,

I did not pay attention to this KIP in the past; seems it was
on-hold for a while.

Overall it sounds very useful, and I think we should extend this
with a follow up KIP for Kafka Streams. What is unclear to me at this
point
is the statement:

Kafka Streams applications have an application.id configured
and
this identifier should be included as the application_id metrics
label.

The `application.id` is currently only used as the (main)
consumer's `group.id` (and is part of an auto-generated `client.id`
if
the user does not set one).

This comment related to:

The following labels should be added by the client as
appropriate
before metrics are pushed.

Given that Kafka Streams uses the consumer/producer/admin client
as
"black boxes", a client does at this point not know that it's part of
a
Kafka Streams application, and thus, it won't be able to attach any
such
label to the metrics it sends. (Also producer and admin don't even
know
the
value of `application.id` -- only the (main) consumer, indirectly
via `
group.id`, but also restore and global consumer don't know it,
because
they don't have `group.id` set).

While I am totally in favor of the proposal, I am wondering how
we
intent to implement it in clean way? Or would we do ok to have some
internal client APIs that KS can use to "register" itself with the
client?



While clients must support both temporalities, the broker will
initially only send
GetTelemetrySubscriptionsResponse.DeltaTemporality=True

Not sure if I can follow. How make the decision about DELTA or
CUMULATIVE metrics? Should the broker side plugin not decide what
metrics
it what to receive in which form? So what does "initially" mean -- the
broker won't ship with a default plugin implementation?



The following method is added to the Producer, Consumer, and
Admin
client interfaces:

Should we add anything to Kafka Streams to expose the underlying
clients' assigned client-instance-ids programmatically? I am also
wondering
if clients should report their assigned client-instance-ids as metrics
itself (for this case, Kafka Streams won't need to do anything,
because
we
already expose all client metrics).

If we add anything programmatic, we need to make it simple,
given
that Kafka Streams has many clients per `StreamThread` and may have
multiple threads.



enable.metrics.push
It might be worth to add this to `StreamsConfig`, too? It set
via
StreamsConfig, we would forward it to all clients automatically.




-Matthias


On 9/29/23 5:45 PM, David Jacot wrote:
Hi Andrew,
Thanks for driving this one. I haven't read all the KIP yet
but I
already
have an initial question. In the Threading section, it is
written
"KafkaConsumer: the "background" thread (based on the consumer
threading
refactor which is underway)". If I understand this correctly,
it
means
that KIP-714 won't work if the "old consumer" is used. Am I
correct?
Cheers,
David
On Fri, Sep 22, 2023 at 12:18 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:
Hi Philip,
No, I do not think it should actively search for a broker that
supports
the new
RPCs. In general, either all of the brokers or none of the
brokers will
support it.
In the window, where the cluster is being upgraded or client
telemetry is
being
enabled, there might be a mixed situation. I wouldn’t put too
much effort
into
this mixed scenario. As the client finds brokers which support
the new
RPCs,
it can begin to follow the KIP-714 mechanism.

Thanks,
Andrew

On 22 Sep 2023, at 20:01, Philip Nee <philip...@gmail.com>
wrote:

Hi Andrew -

Question on top of your answers: Do you think the client
should
actively
search for a broker that supports this RPC? As previously
mentioned, the
broker uses the leastLoadedNode to find its first connection
(am
I correct?), and what if that broker doesn't support the
metric
push?

P

On Fri, Sep 22, 2023 at 10:20 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

Hi Kirk,
Thanks for your question. You are correct that the presence
or
absence
of
the new RPCs in the
ApiVersionsResponse tells the client whether to request the
telemetry
subscriptions and push
metrics.

This is of course tricky in practice. It would be
conceivable,
as a
cluster is upgraded to AK 3.7
or as a client metrics receiver plugin is deployed across
the
cluster,
that a client connects to some
brokers that support the new RPCs and some that do not.

Here’s my suggestion:
* If a client is not connected to any brokers that support
in
the new
RPCs, it cannot push metrics.
* If a client is only connected to brokers that support the
new
RPCs, it
will use the new RPCs in
accordance with the KIP.
* If a client is connected to some brokers that support the
new
RPCs and
some that do not, it will
use the new RPCs with the supporting subset of brokers in
accordance
with
the KIP.

Comments?

Thanks,
Andrew

On 22 Sep 2023, at 16:01, Kirk True <k...@kirktrue.pro>
wrote:

Hi Andrew/Jun,

I want to make sure I understand question/comment #119… In
the
case
where a cluster without a metrics client receiver is later
reconfigured
and
restarted to include a metrics client receiver, do we want
the
client to
thereafter begin pushing metrics to the cluster? From
Andrew’s
response
to
question #119, it sounds like we’re using the
presence/absence
of the
relevant RPCs in ApiVersionsResponse as the
to-push-or-not-to-push
indicator. Do I have that correct?

Thanks,
Kirk

On Sep 21, 2023, at 7:42 AM, Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

Hi Jun,
Thanks for your comments. I’ve updated the KIP to clarify
where
necessary.

110. Yes, agree. The motivation section mentions this.

111. The replacement of ‘-‘ with ‘.’ for metric names and
the
replacement of
‘-‘ with ‘_’ for attribute keys is following the OTLP
guidelines. I
think it’s a bit
of a debatable point. OTLP makes a distinction between a
namespace
and a
multi-word component. If it was “client.id” then “client”
would be a
namespace with
an attribute key “id”. But “client_id” is just a key. So,
it
was
intentional, but debatable.

112. Thanks. The link target moved. Fixed.

113. Thanks. Fixed.

114.1. If a standard metric makes sense for a client, it
should use
the
exact same
name. If a standard metric doesn’t make sense for a
client,
then it
can
omit that metric.

For a required metric, the situation is stronger. All
clients
must
implement these
metrics with these names in order to implement the KIP.
But
the
required metrics
are essentially the number of connections and the request
latency,
which do not
reference the underlying implementation of the client
(which
producer.record.queue.time.max
of course does).

I suppose someone might build a producer-only client that
didn’t have
consumer metrics.
In this case, the consumer metrics would conceptually have
the value 0
and would not
need to be sent to the broker.

114.2. If a client does not implement some metrics, they
will
not be
available for
analysis and troubleshooting. It just makes the ability to
combine
metrics from lots
different clients less complete.

115. I think it was probably a mistake to be so specific
about
threading in this KIP.
When the consumer threading refactor is complete, of
course,
it would
do the appropriate
equivalent. I’ve added a clarification and massively
simplified this
section.

116. I removed “client.terminating”.

117. Yes. Horrid. Fixed.

118. The Terminating flag just indicates that this is the
final
PushTelemetryRequest
from this client. Any subsequent request will be
rejected. I
think
this
flag should remain.

119. Good catch. This was actually contradicting another
part
of the
KIP. The current behaviour
is indeed preserved. If the broker doesn’t have a client
metrics
receiver plugin, the new RPCs
in this KIP are “turned off” and not reported in
ApiVersionsResponse.
The client will not
attempt to push metrics.

120. The error handling table lists the error codes for
PushTelemetryResponse. I’ve added one
but it looked good to me. GetTelemetrySubscriptions
doesn’t
have any
error codes, since the
situation in which the client telemetry is not supported
is
handled by
the RPCs not being offered
by the broker.

121. Again, I think it’s probably a mistake to be specific
about
threading. Removed.

122. Good catch. For DescribeConfigs, the ACL operation
should be
“DESCRIBE_CONFIGS”. For AlterConfigs, the ACL operation
should be
“ALTER” (not “WRITE” as it said). The checks are made on
the
CLUSTER
resource.

Thanks for the detailed review.

Thanks,
Andrew


110. Another potential motivation is the multiple clients
support.
Some of
the places may not have good monitoring support for
non-java
clients.

111. OpenTelemetry Naming: We replace '-' with '.' for
metric name
and
replace '-' with '_' for attributes. Why is the
inconsistency?

112. OTLP specification: Page is not found from the link.

113. "Defining standard and required metrics makes the
monitoring and
troubleshooting of clients from various client types ":
Incomplete
sentence.

114. standard/required metrics
114.1 Do other clients need to implement those metrics
with
the exact
same
names?
114.2 What happens if some of those metrics are missing
from
a
client?

115. "KafkaConsumer: both the "heart beat" and
application
threads":
We
have an ongoing effort to refactor the consumer threading
model (





https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design
).
Once this is done, PRC requests will only be made from
the
background
thread. Should this KIP follow the new model only?

116. 'The metrics should contain the reason for the
client
termination
by
including the client.terminating metric with the label
“reason” ...'.
Hmm,
are we introducing a new metric client.terminating? If
so,
that needs
to be
explicitly listed.

117. "As the metrics plugin may need to add additional
metrics on top
of
this the generic metrics receiver in the broker will not
add
these
labels
but rely on the plugins to do so," The sentence doesn't
read
well.

118. "it is possible for the client to send at most one
accepted
out-of-profile per connection before the rate-limiter
kicks
in": If
we
do
this, do we still need the Terminating flag in
PushTelemetryRequestV0?

119. "If there is no client metrics receiver plugin
configured on the
broker, it will respond to
GetTelemetrySubscriptionsRequest
with
RequestedMetrics set to Null and a -1 SubscriptionId. The
client
should
send a new GetTelemetrySubscriptionsRequest after the
PushIntervalMs
has
expired. This allows the metrics receiver to be enabled
or
disabled
without
having to restart the broker or reset the client
connection."
"no client metrics receiver plugin configured" is defined
by
no
metric
reporter implementing the ClientTelemetry interface,
right?
In that
case,
it would be useful to avoid the clients sending
GetTelemetrySubscriptionsRequest periodically to preserve
the current
behavior.

120. GetTelemetrySubscriptionsResponseV0 and
PushTelemetryRequestV0:
Could
we list error codes for each?

121. "ClientTelemetryReceiver.ClientTelemetryReceiver
This
method may
be
called from the request handling thread": Where else can
this method
be
called?

122. DescribeConfigs/AlterConfigs already exist. Are we
changing the
ACL?

Thanks,

Jun

On Mon, Jul 31, 2023 at 4:33 AM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

Hi Milind,
Thanks for your question.

On reflection, I agree that INVALID_RECORD is most
likely
to be
caused by a
problem in the serialization in the client. I have
changed
the
client
action in this case
to “Log an error and stop pushing metrics”.

I have updated the KIP text accordingly.

Thanks,
Andrew

On 31 Jul 2023, at 12:09, Milind Luthra
<milut...@confluent.io.INVALID>
wrote:

Hi Andrew,
Thanks for the clarifications.

About 2b:
In case a client has a bug while serializing, it might
be
difficult
for
the
client to recover from that without code changes. In
that,
it might
be
good
to just log the INVALID_RECORD as an error, and treat
the
error as
fatal
for the client (only fatal in terms of sending the
metrics, the
client
can
keep functioning otherwise). What do you think?

Thanks
Milind

On Mon, Jul 24, 2023 at 8:18 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

Hi Milind,
Thanks for your questions about the KIP.

1) I did some archaeology and looked at historical
versions of the
KIP.
I
think this is
just a mistake. 5 minutes is the default metric push
interval. 30
minutes
is a mystery
to me. I’ve updated the KIP.

2) I think there are two situations in which
INVALID_RECORD might
occur.
a) The client might perhaps be using a content-type
that
the
broker
does
not support.
The KIP mentions content-type as a future extension,
but
there’s
only
one
supported
to start with. Until we have multiple content-types,
this
seems
out
of
scope. I think a
future KIP would add another error code for this.
b) The client might perhaps have a bug which means the
metrics
payload
is
malformed.
Logging a warning and attempting the next metrics push
on
the push
interval seems
appropriate.

UNKNOWN_SUBSCRIPTION_ID would indeed be handled by
making
an
immediate
GetTelemetrySubscriptionsRequest.

UNSUPPORTED_COMPRESSION_TYPE seems like either a
client
bug or
perhaps
a situation in which a broker sends a compression type
in
a
GetTelemetrySubscriptionsResponse
which is subsequently not supported when its used
with a
PushTelemetryRequest.
We do want the client to have the opportunity to get
an
up-to-date
list
of
supported
compression types. I think an immediate
GetTelemetrySubscriptionsRequest
is appropriate.

3) If a client attempts a subsequent handshake with a
Null
ClientInstanceId, the
receiving broker may not already know the client's
existing
ClientInstanceId. If the
receiving broker knows the existing ClientInstanceId,
it
simply
responds
the existing
value back to the client. If it does not know the
existing
ClientInstanceId, it will create
a new client instance ID and respond with that.

I will update the KIP with these clarifications.

Thanks,
Andrew

On 17 Jul 2023, at 14:21, Milind Luthra
<milut...@confluent.io.INVALID

wrote:

Hi Andrew, thanks for this KIP.

I had a few questions regarding the "Error handling"
section.

1. It mentions that "The 5 and 30 minute retries are
to
eventually
trigger
a retry and avoid having to restart clients if the
cluster
metrics
configuration is disabled temporarily, e.g., by
operator
error,
rolling
upgrades, etc."
But this 30 min interval isn't mentioned anywhere
else.
What is
it
referring to?

2. For the actual errors:
INVALID_RECORD : The action required is to "Log a
warning to the
application and schedule the next
GetTelemetrySubscriptionsRequest
to 5
minutes". Why is this 5 minutes, and not something
like
PushIntervalMs?
And
also, why are we scheduling a
GetTelemetrySubscriptionsRequest in
this
case, if the serialization is broken?
UNKNOWN_SUBSCRIPTION_ID ,
UNSUPPORTED_COMPRESSION_TYPE
:
just to
confirm,
the GetTelemetrySubscriptionsRequest needs to be
scheduled
immediately
after the PushTelemetry response, correct?

3. For "Subsequent GetTelemetrySubscriptionsRequests
must include
the
ClientInstanceId returned in the first response,
regardless of
broker":
Will a broker error be returned in case some
implementation of
this KIP
violates this accidentally and sends a request with
ClientInstanceId =
Null
even when it's been obtained already? Or will a new
ClientInstanceId be
returned without an error?

Thanks!

On Tue, Jun 13, 2023 at 8:38 PM Andrew Schofield <
andrew_schofield_j...@outlook.com> wrote:

Hi,
I would like to start a new discussion thread on
KIP-714: Client
metrics
and observability.









https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability

I have edited the proposal significantly to reduce
the
scope.
The
overall
mechanism for client metric subscriptions is
unchanged,
but the
KIP is now based on the existing client metrics,
rather
than
introducing
new metrics. The purpose remains helping cluster
operators
investigate performance problems experienced by
clients
without
requiring
changes to the client application code or
configuration.

Thanks,
Andrew








Reply via email to