Hi Matthias, 131) I also think that separating the main and restore consumers is excessively specific about the current implementation. So, I think I’d prefer:
public class ClientInstanceIds { String adminInstanceId(); Optional<String> globalConsumerInstanceId(); Map<String, String> consumerInstanceIds(); Map<String, String> producerInstanceIds(); } I’m not sure whether it makes sense to combine the global consumer instance id too. I’ve got rid of the double plural :) My 2 cents. Thanks, Andrew > On 12 Oct 2023, at 17:29, Matthias J. Sax <mj...@apache.org> wrote: > > Thanks Sophie and Jun. > > `clientInstanceIds()` is fine with me -- was not sure about the double plural > myself. > > Sorry if my comments was confusing. I was trying to say, that adding a > overload to `KafkaStreams` that does not take a timeout parameter does not > make sense, because there is no `default.api.timeout.ms` config for Kafka > Streams, so users always need to pass in a timeout. (Same for producer.) > > For the implementation, I think KS would always call > `client.clientInstanceId(timeout)` and never rely on `default.api.timeout.ms` > though, so we can stay in control -- if a timeout is passed by the user, it > would always overwrite `default.api.timeout.ms` on the consumer/admin and > thus we should follow the same semantics in Kafka Streams, and overwrite it > explicitly when calling `client.clientInstanceId()`. > > The proposed API also makes sense to me. I was just wondering if we want to > extend it for client users -- for KS we won't need/use the timeout-less > overloads. > > > > 130) My intent was to throw a TimeoutException if we cannot get all > instanceIds, because it's the standard contract for timeouts. It would also > be hard to tell for a user, if a full or partial result was returned (or we > add a method `boolean isPartialResult()` to make it easier for users). > > If there is concerns/objections, I am also ok to return a partial result -- > it would require a change to the newly added `ClientInstanceIds` return type > -- for `adminInstanceId` we only return a `String` right now -- we might need > to change this to `Optional<String>` so we are able to return a partial > result? > > > 131) Of course we could, but I am not sure what we would gain? In the end, > implementation details would always leak because if we change the number of > consumer we use, we would return different keys in the `Map`. Atm, the > proposal implies that the same key might be used for the "main" and "restore" > consumer of the same thread -- but we can make keys unique by adding a > `-restore` suffix to the restore-consumer key if we merge both maps. -- > Curious to hear what others think. I am very open to do it differently than > currently proposed. > > > -Matthias > > > On 10/12/23 8:39 AM, Jun Rao wrote: >> Hi, Matthias, >> Thanks for the reply. >> 130. What would be the semantic? If the timeout has expired and only some >> of the client instances' id have been retrieved, does the call return the >> partial result or throw an exception? >> 131. Could we group all consumer instances in a single method since we are >> returning the key for each instance already? This probably also avoids >> exposing implementation details that could change over time. >> Thanks, >> Jun >> On Thu, Oct 12, 2023 at 12:00 AM Sophie Blee-Goldman <sop...@responsive.dev> >> wrote: >>> Regarding the naming, I personally think `clientInstanceId` makes sense for >>> the plain clients >>> -- especially if we might later introduce the notion of an >>> `applicationInstanceId`. >>> >>> I'm not a huge fan of `clientsInstanceIds` for the Kafka Streams API, >>> though, can we use >>> `clientInstanceIds` instead? (The difference being the placement of the >>> plural 's') >>> I would similarly rename the class to just ClientInstanceIds >>> >>> we can also not have a timeout-less overload, because `KafkaStreams` does >>>> not have a `default.api.timeout.ms` config either >>> >>> With respect to the timeout for the Kafka Streams API, I'm a bit confused >>> by the >>> doubletriple-negative of Matthias' comment here, but I was thinking about >>> this >>> earlier and this was my take: with the current proposal, we would allow >>> users to pass >>> in an absolute timeout as a parameter that would apply to the method as a >>> whole. >>> Meanwhile within the method we would issue separate calls to each of the >>> clients using >>> the default or user-configured value of their `default.api.timeout.ms` as >>> the timeout >>> parameter. >>> >>> So the API as proposed makes sense to me. >>> >>> >>> On Wed, Oct 11, 2023 at 6:48 PM Matthias J. Sax <mj...@apache.org> wrote: >>> >>>> In can answer 130 and 131. >>>> >>>> 130) We cannot guarantee that all clients are already initialized due to >>>> race conditions. We plan to not allow calling >>>> `KafkaStreams#clientsInstanceIds()` when the state is not RUNNING (or >>>> REBALANCING) though -- guess this slipped on the KIP and should be >>>> added? But because StreamThreads can be added dynamically (and producer >>>> might be created dynamically at runtime; cf below), we still cannot >>>> guarantee that all clients are already initialized when the method is >>>> called. Of course, we assume that all clients are most likely initialize >>>> on the happy path, and blocking calls to `client.clientInstanceId()` >>>> should be rare. >>>> >>>> To address the worst case, we won't do a naive implementation and just >>>> loop over all clients, but fan-out the call to the different >>>> StreamThreads (and GlobalStreamThread if it exists), and use Futures to >>>> gather the results. >>>> >>>> Currently, `StreamThreads` has 3 clients (if ALOS or EOSv2 is used), so >>>> we might do 3 blocking calls in the worst case (for EOSv1 we get a >>>> producer per tasks, and we might end up doing more blocking calls if the >>>> producers are not initialized yet). Note that EOSv1 is already >>>> deprecated, and we are also working on thread refactoring that will >>>> reduce the number of client on StreamThread to 2 -- and we have more >>>> refactoring planned to reduce the number of clients even further. >>>> >>>> Inside `KafakStreams#clientsInstanceIds()` we might only do single >>>> blocking call for the admin client (ie, `admin.clientInstanceId()`). >>>> >>>> I agree that we need to do some clever timeout management, but it seems >>>> to be more of an implementation detail? >>>> >>>> Do you have any particular concerns, or does the proposed implementation >>>> as sketched above address your question? >>>> >>>> >>>> 130) If the Topology does not have a global-state-store, there won't be >>>> a GlobalThread and thus not global consumer. Thus, we return an Optional. >>>> >>>> >>>> >>>> On three related question for Andrew. >>>> >>>> (1) Why is the method called `clientInstanceId()` and not just plain >>>> `instanceId()`? >>>> >>>> (2) Why so we return a `String` while but not a UUID type? The added >>>> protocol request/response classes use UUIDs. >>>> >>>> (3) Would it make sense to have an overloaded `clientInstanceId()` >>>> method that does not take any parameter but uses `default.api.timeout` >>>> config (this config does no exist on the producer though, so we could >>>> only have it for consumer and admin at this point). We could of course >>>> also add overloads like this later if user request them (and/or add >>>> `default.api.timeout.ms` to the producer, too). >>>> >>>> Btw: For KafkaStreams, I think `clientsInstanceIds` still makes sense as >>>> a method name though, as `KafkaStreams` itself does not have an >>>> `instanceId` -- we can also not have a timeout-less overload, because >>>> `KafkaStreams` does not have a `default.api.timeout.ms` config either >>>> (and I don't think it make sense to add). >>>> >>>> >>>> >>>> -Matthias >>>> >>>> On 10/11/23 5:07 PM, Jun Rao wrote: >>>>> Hi, Andrew, >>>>> >>>>> Thanks for the updated KIP. Just a few more minor comments. >>>>> >>>>> 130. KafkaStreams.clientsInstanceId(Duration timeout): Does it wait for >>>> all >>>>> consumer/producer/adminClient instances to be initialized? Are all >>> those >>>>> instances created during KafkaStreams initialization? >>>>> >>>>> 131. Why does globalConsumerInstanceId() return Optional<String> while >>>>> other consumer instances don't return Optional? >>>>> >>>>> 132. ClientMetricsSubscriptionRequestCount: Do we need this since we >>>> have a >>>>> set of generic metrics >>>>> (kafka.network:type=RequestMetrics,name=RequestsPerSec,request=*) that >>>>> report Request rate for every request type? >>>>> >>>>> Thanks, >>>>> >>>>> Jun >>>>> >>>>> On Wed, Oct 11, 2023 at 1:47 PM Matthias J. Sax <mj...@apache.org> >>>> wrote: >>>>> >>>>>> Thanks! >>>>>> >>>>>> On 10/10/23 11:31 PM, Andrew Schofield wrote: >>>>>>> Matthias, >>>>>>> Yes, I think that’s a sensible way forward and the interface you >>>> propose >>>>>> looks good. I’ll update the KIP accordingly. >>>>>>> >>>>>>> Thanks, >>>>>>> Andrew >>>>>>> >>>>>>>> On 10 Oct 2023, at 23:01, Matthias J. Sax <mj...@apache.org> wrote: >>>>>>>> >>>>>>>> Andrew, >>>>>>>> >>>>>>>> yes I would like to get this change into KIP-714 right way. Seems to >>>> be >>>>>> important, as we don't know if/when a follow-up KIP for Kafka Streams >>>> would >>>>>> land. >>>>>>>> >>>>>>>> I was also thinking (and discussed with a few others) how to expose >>>> it, >>>>>> and we would propose the following: >>>>>>>> >>>>>>>> We add a new method to `KafkaStreams` class: >>>>>>>> >>>>>>>> public ClientsInstanceIds clientsInstanceIds(Duration timeout); >>>>>>>> >>>>>>>> The returned object is like below: >>>>>>>> >>>>>>>> public class ClientsInstanceIds { >>>>>>>> // we only have a single admin client per KS instance >>>>>>>> String adminInstanceId(); >>>>>>>> >>>>>>>> // we only have a single global consumer per KS instance (if >>> any) >>>>>>>> // Optional<> because we might not have global-thread >>>>>>>> Optional<String> globalConsumerInstanceId(); >>>>>>>> >>>>>>>> // return a <threadKey -> ClientInstanceId> mapping >>>>>>>> // for the underlying (restore-)consumers/producers >>>>>>>> Map<String, String> mainConsumerInstanceIds(); >>>>>>>> Map<String, String> restoreConsumerInstanceIds(); >>>>>>>> Map<String, String> producerInstanceIds(); >>>>>>>> } >>>>>>>> >>>>>>>> For the `threadKey`, we would use some pattern like this: >>>>>>>> >>>>>>>> [Stream|StateUpdater]Thread-<threadIdx> >>>>>>>> >>>>>>>> >>>>>>>> Would this work from your POV? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 10/9/23 2:15 AM, Andrew Schofield wrote: >>>>>>>>> Hi Matthias, >>>>>>>>> Good point. Makes sense to me. >>>>>>>>> Is this something that can also be included in the proposed Kafka >>>>>> Streams follow-on KIP, or would you prefer that I add it to KIP-714? >>>>>>>>> I have a slight preference for the former to put all of the KS >>>>>> enhancements into a separate KIP. >>>>>>>>> Thanks, >>>>>>>>> Andrew >>>>>>>>>> On 7 Oct 2023, at 02:12, Matthias J. Sax <mj...@apache.org> >>> wrote: >>>>>>>>>> >>>>>>>>>> Thanks Andrew. SGTM. >>>>>>>>>> >>>>>>>>>> One point you did not address is the idea to add a method to >>>>>> `KafkaStreams` similar to the proposed `clientInstanceId()` that will >>> be >>>>>> added to consumer/producer/admin clients. >>>>>>>>>> >>>>>>>>>> Without addressing this, Kafka Streams users won't have a way to >>> get >>>>>> the assigned `instanceId` of the internally created clients, and thus >>> it >>>>>> would be very difficult for them to know which metrics that the broker >>>>>> receives belong to a Kafka Streams app. It seems they would only find >>>> the >>>>>> `instanceIds` in the log4j output if they enable client logging? >>>>>>>>>> >>>>>>>>>> Of course, because there is multiple clients inside Kafka Streams, >>>>>> the return type cannot be an single "String", but must be some some >>>> complex >>>>>> data structure -- we could either add a new class, or return a >>>>>> Map<String,String> using a client key that maps to the `instanceId`. >>>>>>>>>> >>>>>>>>>> For example we could use the following key: >>>>>>>>>> >>>>>>>>>> >>>> [Global]StreamThread[-<threadIndex>][-restore][consumer|producer] >>>>>>>>>> >>>>>>>>>> (Of course, only the valid combination.) >>>>>>>>>> >>>>>>>>>> Or maybe even better, we might want to return a `Future` because >>>>>> collection all the `instanceId` might be a blocking all on each >>> client? >>>> I >>>>>> have already a few idea how it could be implemented but I don't think >>> it >>>>>> must be discussed on the KIP, as it's an implementation detail. >>>>>>>>>> >>>>>>>>>> Thoughts? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> On 10/6/23 4:21 AM, Andrew Schofield wrote: >>>>>>>>>>> Hi Matthias, >>>>>>>>>>> Thanks for your comments. I agree that a follow-up KIP for Kafka >>>>>> Streams makes sense. This KIP currently has made a bit >>>>>>>>>>> of an effort to embrace KS, but it’s not enough by a long way. >>>>>>>>>>> I have removed `application.id <http://application.id/>`. This >>>>>> should be done properly in the follow-up KIP. I don’t believe there’s >>> a >>>>>> downside to >>>>>>>>>>> removing it from this KIP. >>>>>>>>>>> I have reworded the statement about temporarily. In practice, the >>>>>> implementation of this KIP that’s going on while the voting >>>>>>>>>>> progresses happens to use delta temporality, but that’s an >>>>>> implementation detail. Supporting clients must support both >>>>>>>>>>> temporalities. >>>>>>>>>>> I thought about exposing the client instance ID as a metric, but >>>>>> non-numeric metrics are not usual practice and tools >>>>>>>>>>> do not universally support them. I don’t think the KIP is >>> improved >>>>>> by adding one now. >>>>>>>>>>> I have also added constants for the various Config classes for >>>>>> ENABLE_METRICS_PUSH_CONFIG, including to >>>>>>>>>>> StreamsConfig. It’s best to be explicit about this. >>>>>>>>>>> Thanks, >>>>>>>>>>> Andrew >>>>>>>>>>>> On 2 Oct 2023, at 23:47, Matthias J. Sax <mj...@apache.org> >>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I did not pay attention to this KIP in the past; seems it was >>>>>> on-hold for a while. >>>>>>>>>>>> >>>>>>>>>>>> Overall it sounds very useful, and I think we should extend this >>>>>> with a follow up KIP for Kafka Streams. What is unclear to me at this >>>> point >>>>>> is the statement: >>>>>>>>>>>> >>>>>>>>>>>>> Kafka Streams applications have an application.id configured >>> and >>>>>> this identifier should be included as the application_id metrics >>> label. >>>>>>>>>>>> >>>>>>>>>>>> The `application.id` is currently only used as the (main) >>>>>> consumer's `group.id` (and is part of an auto-generated `client.id` >>> if >>>>>> the user does not set one). >>>>>>>>>>>> >>>>>>>>>>>> This comment related to: >>>>>>>>>>>> >>>>>>>>>>>>> The following labels should be added by the client as >>> appropriate >>>>>> before metrics are pushed. >>>>>>>>>>>> >>>>>>>>>>>> Given that Kafka Streams uses the consumer/producer/admin client >>>> as >>>>>> "black boxes", a client does at this point not know that it's part of >>> a >>>>>> Kafka Streams application, and thus, it won't be able to attach any >>> such >>>>>> label to the metrics it sends. (Also producer and admin don't even >>> know >>>> the >>>>>> value of `application.id` -- only the (main) consumer, indirectly >>> via ` >>>>>> group.id`, but also restore and global consumer don't know it, >>> because >>>>>> they don't have `group.id` set). >>>>>>>>>>>> >>>>>>>>>>>> While I am totally in favor of the proposal, I am wondering how >>> we >>>>>> intent to implement it in clean way? Or would we do ok to have some >>>>>> internal client APIs that KS can use to "register" itself with the >>>> client? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> While clients must support both temporalities, the broker will >>>>>> initially only send >>>> GetTelemetrySubscriptionsResponse.DeltaTemporality=True >>>>>>>>>>>> >>>>>>>>>>>> Not sure if I can follow. How make the decision about DELTA or >>>>>> CUMULATIVE metrics? Should the broker side plugin not decide what >>>> metrics >>>>>> it what to receive in which form? So what does "initially" mean -- the >>>>>> broker won't ship with a default plugin implementation? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The following method is added to the Producer, Consumer, and >>>> Admin >>>>>> client interfaces: >>>>>>>>>>>> >>>>>>>>>>>> Should we add anything to Kafka Streams to expose the underlying >>>>>> clients' assigned client-instance-ids programmatically? I am also >>>> wondering >>>>>> if clients should report their assigned client-instance-ids as metrics >>>>>> itself (for this case, Kafka Streams won't need to do anything, >>> because >>>> we >>>>>> already expose all client metrics). >>>>>>>>>>>> >>>>>>>>>>>> If we add anything programmatic, we need to make it simple, >>> given >>>>>> that Kafka Streams has many clients per `StreamThread` and may have >>>>>> multiple threads. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> enable.metrics.push >>>>>>>>>>>> It might be worth to add this to `StreamsConfig`, too? It set >>> via >>>>>> StreamsConfig, we would forward it to all clients automatically. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -Matthias >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9/29/23 5:45 PM, David Jacot wrote: >>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>> Thanks for driving this one. I haven't read all the KIP yet >>> but I >>>>>> already >>>>>>>>>>>>> have an initial question. In the Threading section, it is >>> written >>>>>>>>>>>>> "KafkaConsumer: the "background" thread (based on the consumer >>>>>> threading >>>>>>>>>>>>> refactor which is underway)". If I understand this correctly, >>> it >>>>>> means >>>>>>>>>>>>> that KIP-714 won't work if the "old consumer" is used. Am I >>>>>> correct? >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> David >>>>>>>>>>>>> On Fri, Sep 22, 2023 at 12:18 PM Andrew Schofield < >>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>> Hi Philip, >>>>>>>>>>>>>> No, I do not think it should actively search for a broker that >>>>>> supports >>>>>>>>>>>>>> the new >>>>>>>>>>>>>> RPCs. In general, either all of the brokers or none of the >>>>>> brokers will >>>>>>>>>>>>>> support it. >>>>>>>>>>>>>> In the window, where the cluster is being upgraded or client >>>>>> telemetry is >>>>>>>>>>>>>> being >>>>>>>>>>>>>> enabled, there might be a mixed situation. I wouldn’t put too >>>>>> much effort >>>>>>>>>>>>>> into >>>>>>>>>>>>>> this mixed scenario. As the client finds brokers which support >>>>>> the new >>>>>>>>>>>>>> RPCs, >>>>>>>>>>>>>> it can begin to follow the KIP-714 mechanism. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 22 Sep 2023, at 20:01, Philip Nee <philip...@gmail.com> >>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Andrew - >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Question on top of your answers: Do you think the client >>> should >>>>>> actively >>>>>>>>>>>>>>> search for a broker that supports this RPC? As previously >>>>>> mentioned, the >>>>>>>>>>>>>>> broker uses the leastLoadedNode to find its first connection >>>> (am >>>>>>>>>>>>>>> I correct?), and what if that broker doesn't support the >>> metric >>>>>> push? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> P >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Sep 22, 2023 at 10:20 AM Andrew Schofield < >>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Kirk, >>>>>>>>>>>>>>>> Thanks for your question. You are correct that the presence >>> or >>>>>> absence >>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> the new RPCs in the >>>>>>>>>>>>>>>> ApiVersionsResponse tells the client whether to request the >>>>>> telemetry >>>>>>>>>>>>>>>> subscriptions and push >>>>>>>>>>>>>>>> metrics. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is of course tricky in practice. It would be >>> conceivable, >>>>>> as a >>>>>>>>>>>>>>>> cluster is upgraded to AK 3.7 >>>>>>>>>>>>>>>> or as a client metrics receiver plugin is deployed across >>> the >>>>>> cluster, >>>>>>>>>>>>>>>> that a client connects to some >>>>>>>>>>>>>>>> brokers that support the new RPCs and some that do not. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Here’s my suggestion: >>>>>>>>>>>>>>>> * If a client is not connected to any brokers that support >>> in >>>>>> the new >>>>>>>>>>>>>>>> RPCs, it cannot push metrics. >>>>>>>>>>>>>>>> * If a client is only connected to brokers that support the >>>> new >>>>>> RPCs, it >>>>>>>>>>>>>>>> will use the new RPCs in >>>>>>>>>>>>>>>> accordance with the KIP. >>>>>>>>>>>>>>>> * If a client is connected to some brokers that support the >>>> new >>>>>> RPCs and >>>>>>>>>>>>>>>> some that do not, it will >>>>>>>>>>>>>>>> use the new RPCs with the supporting subset of brokers in >>>>>> accordance >>>>>>>>>>>>>> with >>>>>>>>>>>>>>>> the KIP. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Comments? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 22 Sep 2023, at 16:01, Kirk True <k...@kirktrue.pro> >>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Andrew/Jun, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I want to make sure I understand question/comment #119… In >>>> the >>>>>> case >>>>>>>>>>>>>>>> where a cluster without a metrics client receiver is later >>>>>> reconfigured >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> restarted to include a metrics client receiver, do we want >>> the >>>>>> client to >>>>>>>>>>>>>>>> thereafter begin pushing metrics to the cluster? From >>> Andrew’s >>>>>> response >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> question #119, it sounds like we’re using the >>> presence/absence >>>>>> of the >>>>>>>>>>>>>>>> relevant RPCs in ApiVersionsResponse as the >>>>>> to-push-or-not-to-push >>>>>>>>>>>>>>>> indicator. Do I have that correct? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> Kirk >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Sep 21, 2023, at 7:42 AM, Andrew Schofield < >>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi Jun, >>>>>>>>>>>>>>>>>> Thanks for your comments. I’ve updated the KIP to clarify >>>>>> where >>>>>>>>>>>>>>>> necessary. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 110. Yes, agree. The motivation section mentions this. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 111. The replacement of ‘-‘ with ‘.’ for metric names and >>>> the >>>>>>>>>>>>>>>> replacement of >>>>>>>>>>>>>>>>>> ‘-‘ with ‘_’ for attribute keys is following the OTLP >>>>>> guidelines. I >>>>>>>>>>>>>>>> think it’s a bit >>>>>>>>>>>>>>>>>> of a debatable point. OTLP makes a distinction between a >>>>>> namespace >>>>>>>>>>>>>> and a >>>>>>>>>>>>>>>>>> multi-word component. If it was “client.id” then “client” >>>>>> would be a >>>>>>>>>>>>>>>> namespace with >>>>>>>>>>>>>>>>>> an attribute key “id”. But “client_id” is just a key. So, >>> it >>>>>> was >>>>>>>>>>>>>>>> intentional, but debatable. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 112. Thanks. The link target moved. Fixed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 113. Thanks. Fixed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 114.1. If a standard metric makes sense for a client, it >>>>>> should use >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> exact same >>>>>>>>>>>>>>>>>> name. If a standard metric doesn’t make sense for a >>> client, >>>>>> then it >>>>>>>>>>>>>> can >>>>>>>>>>>>>>>> omit that metric. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For a required metric, the situation is stronger. All >>>> clients >>>>>> must >>>>>>>>>>>>>>>> implement these >>>>>>>>>>>>>>>>>> metrics with these names in order to implement the KIP. >>> But >>>>>> the >>>>>>>>>>>>>>>> required metrics >>>>>>>>>>>>>>>>>> are essentially the number of connections and the request >>>>>> latency, >>>>>>>>>>>>>>>> which do not >>>>>>>>>>>>>>>>>> reference the underlying implementation of the client >>> (which >>>>>>>>>>>>>>>> producer.record.queue.time.max >>>>>>>>>>>>>>>>>> of course does). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I suppose someone might build a producer-only client that >>>>>> didn’t have >>>>>>>>>>>>>>>> consumer metrics. >>>>>>>>>>>>>>>>>> In this case, the consumer metrics would conceptually have >>>>>> the value 0 >>>>>>>>>>>>>>>> and would not >>>>>>>>>>>>>>>>>> need to be sent to the broker. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 114.2. If a client does not implement some metrics, they >>>> will >>>>>> not be >>>>>>>>>>>>>>>> available for >>>>>>>>>>>>>>>>>> analysis and troubleshooting. It just makes the ability to >>>>>> combine >>>>>>>>>>>>>>>> metrics from lots >>>>>>>>>>>>>>>>>> different clients less complete. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 115. I think it was probably a mistake to be so specific >>>> about >>>>>>>>>>>>>>>> threading in this KIP. >>>>>>>>>>>>>>>>>> When the consumer threading refactor is complete, of >>> course, >>>>>> it would >>>>>>>>>>>>>>>> do the appropriate >>>>>>>>>>>>>>>>>> equivalent. I’ve added a clarification and massively >>>>>> simplified this >>>>>>>>>>>>>>>> section. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 116. I removed “client.terminating”. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 117. Yes. Horrid. Fixed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 118. The Terminating flag just indicates that this is the >>>>>> final >>>>>>>>>>>>>>>> PushTelemetryRequest >>>>>>>>>>>>>>>>>> from this client. Any subsequent request will be >>> rejected. I >>>>>> think >>>>>>>>>>>>>> this >>>>>>>>>>>>>>>> flag should remain. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 119. Good catch. This was actually contradicting another >>>> part >>>>>> of the >>>>>>>>>>>>>>>> KIP. The current behaviour >>>>>>>>>>>>>>>>>> is indeed preserved. If the broker doesn’t have a client >>>>>> metrics >>>>>>>>>>>>>>>> receiver plugin, the new RPCs >>>>>>>>>>>>>>>>>> in this KIP are “turned off” and not reported in >>>>>> ApiVersionsResponse. >>>>>>>>>>>>>>>> The client will not >>>>>>>>>>>>>>>>>> attempt to push metrics. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 120. The error handling table lists the error codes for >>>>>>>>>>>>>>>> PushTelemetryResponse. I’ve added one >>>>>>>>>>>>>>>>>> but it looked good to me. GetTelemetrySubscriptions >>> doesn’t >>>>>> have any >>>>>>>>>>>>>>>> error codes, since the >>>>>>>>>>>>>>>>>> situation in which the client telemetry is not supported >>> is >>>>>> handled by >>>>>>>>>>>>>>>> the RPCs not being offered >>>>>>>>>>>>>>>>>> by the broker. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 121. Again, I think it’s probably a mistake to be specific >>>>>> about >>>>>>>>>>>>>>>> threading. Removed. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 122. Good catch. For DescribeConfigs, the ACL operation >>>>>> should be >>>>>>>>>>>>>>>>>> “DESCRIBE_CONFIGS”. For AlterConfigs, the ACL operation >>>>>> should be >>>>>>>>>>>>>>>>>> “ALTER” (not “WRITE” as it said). The checks are made on >>> the >>>>>> CLUSTER >>>>>>>>>>>>>>>>>> resource. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the detailed review. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 110. Another potential motivation is the multiple clients >>>>>> support. >>>>>>>>>>>>>>>> Some of >>>>>>>>>>>>>>>>>>> the places may not have good monitoring support for >>>> non-java >>>>>> clients. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 111. OpenTelemetry Naming: We replace '-' with '.' for >>>>>> metric name >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> replace '-' with '_' for attributes. Why is the >>>>>> inconsistency? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 112. OTLP specification: Page is not found from the link. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 113. "Defining standard and required metrics makes the >>>>>> monitoring and >>>>>>>>>>>>>>>>>>> troubleshooting of clients from various client types ": >>>>>> Incomplete >>>>>>>>>>>>>>>> sentence. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 114. standard/required metrics >>>>>>>>>>>>>>>>>>> 114.1 Do other clients need to implement those metrics >>> with >>>>>> the exact >>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>>> names? >>>>>>>>>>>>>>>>>>> 114.2 What happens if some of those metrics are missing >>>> from >>>>>> a >>>>>>>>>>>>>> client? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 115. "KafkaConsumer: both the "heart beat" and >>> application >>>>>> threads": >>>>>>>>>>>>>> We >>>>>>>>>>>>>>>>>>> have an ongoing effort to refactor the consumer threading >>>>>> model ( >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+threading+refactor+design >>>>>>>>>>>>>>>> ). >>>>>>>>>>>>>>>>>>> Once this is done, PRC requests will only be made from >>> the >>>>>> background >>>>>>>>>>>>>>>>>>> thread. Should this KIP follow the new model only? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 116. 'The metrics should contain the reason for the >>> client >>>>>>>>>>>>>> termination >>>>>>>>>>>>>>>> by >>>>>>>>>>>>>>>>>>> including the client.terminating metric with the label >>>>>> “reason” ...'. >>>>>>>>>>>>>>>> Hmm, >>>>>>>>>>>>>>>>>>> are we introducing a new metric client.terminating? If >>> so, >>>>>> that needs >>>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>>>>>> explicitly listed. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 117. "As the metrics plugin may need to add additional >>>>>> metrics on top >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>> this the generic metrics receiver in the broker will not >>>> add >>>>>> these >>>>>>>>>>>>>>>> labels >>>>>>>>>>>>>>>>>>> but rely on the plugins to do so," The sentence doesn't >>>> read >>>>>> well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 118. "it is possible for the client to send at most one >>>>>> accepted >>>>>>>>>>>>>>>>>>> out-of-profile per connection before the rate-limiter >>> kicks >>>>>> in": If >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> do >>>>>>>>>>>>>>>>>>> this, do we still need the Terminating flag in >>>>>>>>>>>>>> PushTelemetryRequestV0? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 119. "If there is no client metrics receiver plugin >>>>>> configured on the >>>>>>>>>>>>>>>>>>> broker, it will respond to >>> GetTelemetrySubscriptionsRequest >>>>>> with >>>>>>>>>>>>>>>>>>> RequestedMetrics set to Null and a -1 SubscriptionId. The >>>>>> client >>>>>>>>>>>>>> should >>>>>>>>>>>>>>>>>>> send a new GetTelemetrySubscriptionsRequest after the >>>>>> PushIntervalMs >>>>>>>>>>>>>>>> has >>>>>>>>>>>>>>>>>>> expired. This allows the metrics receiver to be enabled >>> or >>>>>> disabled >>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>> having to restart the broker or reset the client >>>> connection." >>>>>>>>>>>>>>>>>>> "no client metrics receiver plugin configured" is defined >>>> by >>>>>> no >>>>>>>>>>>>>> metric >>>>>>>>>>>>>>>>>>> reporter implementing the ClientTelemetry interface, >>> right? >>>>>> In that >>>>>>>>>>>>>>>> case, >>>>>>>>>>>>>>>>>>> it would be useful to avoid the clients sending >>>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest periodically to preserve >>>>>> the current >>>>>>>>>>>>>>>>>>> behavior. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 120. GetTelemetrySubscriptionsResponseV0 and >>>>>> PushTelemetryRequestV0: >>>>>>>>>>>>>>>> Could >>>>>>>>>>>>>>>>>>> we list error codes for each? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 121. "ClientTelemetryReceiver.ClientTelemetryReceiver >>> This >>>>>> method may >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> called from the request handling thread": Where else can >>>>>> this method >>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>> called? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 122. DescribeConfigs/AlterConfigs already exist. Are we >>>>>> changing the >>>>>>>>>>>>>>>> ACL? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jun >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Jul 31, 2023 at 4:33 AM Andrew Schofield < >>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Milind, >>>>>>>>>>>>>>>>>>>> Thanks for your question. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On reflection, I agree that INVALID_RECORD is most >>> likely >>>>>> to be >>>>>>>>>>>>>>>> caused by a >>>>>>>>>>>>>>>>>>>> problem in the serialization in the client. I have >>> changed >>>>>> the >>>>>>>>>>>>>> client >>>>>>>>>>>>>>>>>>>> action in this case >>>>>>>>>>>>>>>>>>>> to “Log an error and stop pushing metrics”. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I have updated the KIP text accordingly. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 31 Jul 2023, at 12:09, Milind Luthra >>>>>>>>>>>>>>>> <milut...@confluent.io.INVALID> >>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Andrew, >>>>>>>>>>>>>>>>>>>>> Thanks for the clarifications. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> About 2b: >>>>>>>>>>>>>>>>>>>>> In case a client has a bug while serializing, it might >>> be >>>>>> difficult >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>> client to recover from that without code changes. In >>>> that, >>>>>> it might >>>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>> good >>>>>>>>>>>>>>>>>>>>> to just log the INVALID_RECORD as an error, and treat >>> the >>>>>> error as >>>>>>>>>>>>>>>> fatal >>>>>>>>>>>>>>>>>>>>> for the client (only fatal in terms of sending the >>>>>> metrics, the >>>>>>>>>>>>>>>> client >>>>>>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>>>>>> keep functioning otherwise). What do you think? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>>>> Milind >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Mon, Jul 24, 2023 at 8:18 PM Andrew Schofield < >>>>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi Milind, >>>>>>>>>>>>>>>>>>>>>> Thanks for your questions about the KIP. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1) I did some archaeology and looked at historical >>>>>> versions of the >>>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>>>>>> I >>>>>>>>>>>>>>>>>>>>>> think this is >>>>>>>>>>>>>>>>>>>>>> just a mistake. 5 minutes is the default metric push >>>>>> interval. 30 >>>>>>>>>>>>>>>>>>>> minutes >>>>>>>>>>>>>>>>>>>>>> is a mystery >>>>>>>>>>>>>>>>>>>>>> to me. I’ve updated the KIP. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2) I think there are two situations in which >>>>>> INVALID_RECORD might >>>>>>>>>>>>>>>> occur. >>>>>>>>>>>>>>>>>>>>>> a) The client might perhaps be using a content-type >>> that >>>>>> the >>>>>>>>>>>>>> broker >>>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>>>>>> not support. >>>>>>>>>>>>>>>>>>>>>> The KIP mentions content-type as a future extension, >>> but >>>>>> there’s >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>>>>>>> to start with. Until we have multiple content-types, >>>> this >>>>>> seems >>>>>>>>>>>>>> out >>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> scope. I think a >>>>>>>>>>>>>>>>>>>>>> future KIP would add another error code for this. >>>>>>>>>>>>>>>>>>>>>> b) The client might perhaps have a bug which means the >>>>>> metrics >>>>>>>>>>>>>>>> payload >>>>>>>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>>>>>> malformed. >>>>>>>>>>>>>>>>>>>>>> Logging a warning and attempting the next metrics push >>>> on >>>>>> the push >>>>>>>>>>>>>>>>>>>>>> interval seems >>>>>>>>>>>>>>>>>>>>>> appropriate. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> UNKNOWN_SUBSCRIPTION_ID would indeed be handled by >>>> making >>>>>> an >>>>>>>>>>>>>>>> immediate >>>>>>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> UNSUPPORTED_COMPRESSION_TYPE seems like either a >>> client >>>>>> bug or >>>>>>>>>>>>>>>> perhaps >>>>>>>>>>>>>>>>>>>>>> a situation in which a broker sends a compression type >>>> in >>>>>> a >>>>>>>>>>>>>>>>>>>>>> GetTelemetrySubscriptionsResponse >>>>>>>>>>>>>>>>>>>>>> which is subsequently not supported when its used >>> with a >>>>>>>>>>>>>>>>>>>>>> PushTelemetryRequest. >>>>>>>>>>>>>>>>>>>>>> We do want the client to have the opportunity to get >>> an >>>>>> up-to-date >>>>>>>>>>>>>>>> list >>>>>>>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>>>> supported >>>>>>>>>>>>>>>>>>>>>> compression types. I think an immediate >>>>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest >>>>>>>>>>>>>>>>>>>>>> is appropriate. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 3) If a client attempts a subsequent handshake with a >>>> Null >>>>>>>>>>>>>>>>>>>>>> ClientInstanceId, the >>>>>>>>>>>>>>>>>>>>>> receiving broker may not already know the client's >>>>>> existing >>>>>>>>>>>>>>>>>>>>>> ClientInstanceId. If the >>>>>>>>>>>>>>>>>>>>>> receiving broker knows the existing ClientInstanceId, >>> it >>>>>> simply >>>>>>>>>>>>>>>> responds >>>>>>>>>>>>>>>>>>>>>> the existing >>>>>>>>>>>>>>>>>>>>>> value back to the client. If it does not know the >>>> existing >>>>>>>>>>>>>>>>>>>>>> ClientInstanceId, it will create >>>>>>>>>>>>>>>>>>>>>> a new client instance ID and respond with that. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I will update the KIP with these clarifications. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Andrew >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 17 Jul 2023, at 14:21, Milind Luthra >>>>>>>>>>>>>>>> <milut...@confluent.io.INVALID >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Andrew, thanks for this KIP. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I had a few questions regarding the "Error handling" >>>>>> section. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 1. It mentions that "The 5 and 30 minute retries are >>> to >>>>>>>>>>>>>> eventually >>>>>>>>>>>>>>>>>>>>>> trigger >>>>>>>>>>>>>>>>>>>>>>> a retry and avoid having to restart clients if the >>>>>> cluster >>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>> configuration is disabled temporarily, e.g., by >>>> operator >>>>>> error, >>>>>>>>>>>>>>>> rolling >>>>>>>>>>>>>>>>>>>>>>> upgrades, etc." >>>>>>>>>>>>>>>>>>>>>>> But this 30 min interval isn't mentioned anywhere >>> else. >>>>>> What is >>>>>>>>>>>>>> it >>>>>>>>>>>>>>>>>>>>>>> referring to? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 2. For the actual errors: >>>>>>>>>>>>>>>>>>>>>>> INVALID_RECORD : The action required is to "Log a >>>>>> warning to the >>>>>>>>>>>>>>>>>>>>>>> application and schedule the next >>>>>>>>>>>>>> GetTelemetrySubscriptionsRequest >>>>>>>>>>>>>>>> to 5 >>>>>>>>>>>>>>>>>>>>>>> minutes". Why is this 5 minutes, and not something >>> like >>>>>>>>>>>>>>>> PushIntervalMs? >>>>>>>>>>>>>>>>>>>>>> And >>>>>>>>>>>>>>>>>>>>>>> also, why are we scheduling a >>>>>> GetTelemetrySubscriptionsRequest in >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>> case, if the serialization is broken? >>>>>>>>>>>>>>>>>>>>>>> UNKNOWN_SUBSCRIPTION_ID , >>> UNSUPPORTED_COMPRESSION_TYPE >>>> : >>>>>> just to >>>>>>>>>>>>>>>>>>>> confirm, >>>>>>>>>>>>>>>>>>>>>>> the GetTelemetrySubscriptionsRequest needs to be >>>>>> scheduled >>>>>>>>>>>>>>>> immediately >>>>>>>>>>>>>>>>>>>>>>> after the PushTelemetry response, correct? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> 3. For "Subsequent GetTelemetrySubscriptionsRequests >>>>>> must include >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>> ClientInstanceId returned in the first response, >>>>>> regardless of >>>>>>>>>>>>>>>> broker": >>>>>>>>>>>>>>>>>>>>>>> Will a broker error be returned in case some >>>>>> implementation of >>>>>>>>>>>>>>>> this KIP >>>>>>>>>>>>>>>>>>>>>>> violates this accidentally and sends a request with >>>>>>>>>>>>>>>> ClientInstanceId = >>>>>>>>>>>>>>>>>>>>>> Null >>>>>>>>>>>>>>>>>>>>>>> even when it's been obtained already? Or will a new >>>>>>>>>>>>>>>> ClientInstanceId be >>>>>>>>>>>>>>>>>>>>>>> returned without an error? >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 13, 2023 at 8:38 PM Andrew Schofield < >>>>>>>>>>>>>>>>>>>>>>> andrew_schofield_j...@outlook.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>>>> I would like to start a new discussion thread on >>>>>> KIP-714: Client >>>>>>>>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>>> and observability. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I have edited the proposal significantly to reduce >>> the >>>>>> scope. >>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>>>>>> overall >>>>>>>>>>>>>>>>>>>>>>>> mechanism for client metric subscriptions is >>>> unchanged, >>>>>> but the >>>>>>>>>>>>>>>>>>>>>>>> KIP is now based on the existing client metrics, >>>> rather >>>>>> than >>>>>>>>>>>>>>>>>>>> introducing >>>>>>>>>>>>>>>>>>>>>>>> new metrics. The purpose remains helping cluster >>>>>> operators >>>>>>>>>>>>>>>>>>>>>>>> investigate performance problems experienced by >>>> clients >>>>>> without >>>>>>>>>>>>>>>>>>>>>> requiring >>>>>>>>>>>>>>>>>>>>>>>> changes to the client application code or >>>> configuration. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>>> Andrew >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>