Re: [DISCUSS] KIP-1324: Support client configuration observability

Matthias J. Sax Tue, 12 May 2026 22:41:23 -0700

Thanks for the KIP Kirk,

I did not find time yet, to read the KIP in full detail, but I wouldlike to follow up on Mickael's question about Streams and Connect. Also,the relationship to KIP-1313 (and KIP-714, KIP-1076) is something Iwould like to ask about.

Starting with Connect, I am frankly not sure if I agree with Mickael?While Connect is a "client" from a broker POV, Connect is still its owncluster. So I would assume that a Connect cluster is run by the sameoperator that also runs the broker cluster, and users would submit theirConnectors into the cluster. Thus, I am wondering, if any such configmonitoring should go into the Connect framework directly? -- Sendingconfigs to the brokers is the only way (and to some extend a workaroundIMHO) to do this monitoring for "actual clients"(producer/consumer/streams), but Connect might be different, and I wouldassume it might actually be much simpler to do the monitoring directlyinside the Connect framework?

I think it's great the we include Streams in the KIP right away. ForKIP-714, this did not happen, and we needed to do KIP-1076 as a followup. However, currently the KIP only mentioned high level that Streamswould get a new config to enable/disable the config push. It's not clearhow this push would actually happen? Atm, Streams treats the clientsmore or less as black-boxes (with the exception of KIP-1071...). ForKIP-714, we extended the clients API (via KIP-1076) allowing to pushcustom metric via the underlying clients. For metrics collection, weactually use a mix of consumer and admin client inside Streams to sendthe different metrics. The disadvantage is still, that the Streamsmetrics use the same clientInstanceId as the consumer/admin client weare piggy-bagging on. This might be fine for metrics: the subscriptionis dynamic, and thus any plugin would accumulate all reported values.For collecting configs, it's a single hand-shake between the client andthe broker though, so it seems this different.

While we could maybe do a similar thing as for KIP-714/KIP-1076, wewould still need a public API (on the admin client) to allow Streams tohand its own configs to the admin client for a push -- but if the adminclient used its own clientInstanceId, we would get two`PushConfigRequest` on the same connection what seems to go against thedesign of the KIP. I am also worried to "confuse" the broker sideplugin, and a second on the same clientInstanceId would overwrite theprevious one? -- Or, as an alternative we would need to give the Streamsconfig to the admin client before it connects to the broker, what doesalso seem to be tricky to get right. Also, we get a config mix on thesame clientInstanceId what does not sound ideal either.

With regard to KIP-1313, I was actually thinking if we should introducea "logical client" concept, and allow a "logical client" like Streams(ie, doesn't open its own network connection), to still get it's ownclientInstanceId? If we follow KIP-1313 and clients generate their ownUUIDs, it would be easy for Streams to also generate its ownclientInstanceId. This might allow us much more easily to send two`PushConfigRequest` with two different `clientInstanceIds`. -- Btw: Iwould believe that other "logical clients / thrid-party frameworks" likeApache Flink or Apache Spark, might also benefit?

Furthermore, if we also change KIP-714 and let the client generate itsUUID for metrics push, it would be good to use the same UUID for bothpushing Streams metrics, and the Streams config. KIP-1331 from Lucas,could also benefit from a unique logical clienstInstanceId for Streams.Of course, but this is more a KIP-1313 discussion, if we go this route,we might need a few more changes. But if we get a better overallsolution, and can unify different concepts it seems to be worth it.



-Matthias




On 5/12/26 2:18 PM, Apoorv Mittal wrote:

Hi Kirk,
Thanks for the KIP. I have some questions:

AM1. The KIP in the request-response flow diagram mentions, "Validate
sensitive configs were excluded". Can you please help me understand what
happens if the sensitive configs are somehow determined by
ClientConfigPolicy? Has the client been notified, or what's the expected
behaviour next?

AM2. The KIP defines default configuration keys for which data will be
transmitted. Also there is a client configuration that can override which
keys to send. If a new client configuration is added in the future, we need
to decide whether to include it in the default set, correct? I am
confirming to determine whether a deny list instead of an allow list would
be helpful, however, the allow list seems less error-prone when
transmitting client-related configurations.

AM3. Unlike KIP-714 where the broker determines which metrics the client
receives, and the subscription can change dynamically for the client. This
KIP has a default set present in the client; therefore, should we mention
why some configs like connections.max.idle.ms, max.block.ms,
max.request.size,metadata.max.age.ms, etc. are not in the default set for
Java producer? Similarly, this applies to the Consumer and Share Consumer
segments. Expecting clients to define all configuration keys for
transmission in properties is cumbersome; therefore, I suggest using a
broader default list.

AM4. The KIP defines new exception ClientConfigTooLargeException when the
configuration size is exceeded. What will the client do when this exception
is encountered? Also the KIP mentions "But as a backup means of preventing
the client from sending too much data, the broker checks the new
configuration client.config.max.bytes prior to invoking the policy", since
the check is happening on broker hence the client has anyways sent the
large payload hence how do we prevent "the client from sending too much
data"?

AM5. The KIP defines IsDefault as the parameter for Config. It's unclear
why this is required since the broker will get the respective config value
anyways. Can you please help me understand?

AM6. The KIP defines instance-count as a client metric. Can you please
detail what value this metric will serve as I expect the client will invoke
PushConfigRequest only once in its instantiated lifetime i.e what is the
usage of the metric?

AM7. Do you think it's reasonable for the broker to also know which client
type is pushing the respective configuration (e.g., producer, consumer,
streams, etc.)? I understand the operator can determine this by looking at
different metrics or RPC calls invoked by client id but if
ClientConfigPolicy wishes to enforce or validate some configurations, the
source might be relevant. Just writing this down for further discussion.

Regards,
Apoorv Mittal


On Thu, May 7, 2026 at 5:06 PM Mickael Maison <[email protected]>
wrote:

Hi,

MM1: Thanks

MM2: The javadoc for ClientConfigPolicy still says "An interface for
intercepting and enforcing client configuration".

MM3: It does not quite answer my question. Are Streams applications
configurations sent to brokers? If it is then we should do it for
Connect too.

MM4: That seems fine.

Thanks,
Mickael

On Tue, May 5, 2026 at 9:33 PM Andrew Schofield <[email protected]>
wrote:


Hi Kirk,
Thanks for your response.

AS11: I tend to agree that admin clients have less interesting

configuration. But the KIP is a specification of the behaviour, so I think
it should say, especially since it does make a few references to admin
clients such as having the config to disable config push. If you do choose
to have some default configs for admin clients, client.id is the obvious
one because it's pushed by all of the other kinds of clients.


AS12: Building on MM3, maybe we should have a compoundClientId for Kafka

Streams, Kafka Connect, and so on. I avoided suggesting clientGroupId and
groupInstanceId because of the opportunity for confusion with existing
concepts :)


Thanks,
Andrew

On 2026/05/05 15:09:46 Kirk True wrote:

Hi Mickael,

On Mon, May 4, 2026, at 12:10 PM, Mickael Maison wrote:

Hi,

Thanks for the KIP, I have a few questions:

MM1: Can we have the definition for the new Exception classes? I'm
particularly interested in the exception for INVALID_CONFIG, how are
the failures returned to the client? From PushConfigResponse it seems
it's just a string? Should it be nullable collection with a specific
message for each issue?


I've added the definition for the ConfigTooLargeException which is now

the only exception added.

MM2: The new broker class is presented as a policy but if I

understand

correctly in case the process method throws and PushConfigResponse

has

a non-NONE error code, the client still continues. This is different
from the other policy classes AlterConfigPolicy, CreateTopicPolicy
which prevent the action in case of a violation. Are we aiming for
validation, observability or both?


We are focused on observability in this KIP, but as you point out, the

policy moniker doesn't make sense in this context.

MM3: It seems you're treating Streams as a separate client instead of
an aggregation of Producers and Consumers (I see StreamsConfig in
Configuration Payload Size Enforcement). What about Connect?


Clients such as Kafka Streams, Connect, etc. are tricky because it's

important to provide context to the embedded client. For example, a
consumer used in a Connect sink would benefit from having that context so
that downstream filtering, aggregation, etc. can take that into account.


LMK if that answers your question or not.

MM4: Regarding sensitive configurations, the current criteria would
treat custom configurations as non sensitive. For example if my
producer has a custom serializer it may have its own custom
configurations too. Is that behavior what you wanted? If so let's me
it clear. By precaution, I'd lean towards considering custom configs
as sensitive as the client has no idea what it is.


The intention of the KIP is for any custom configuration (custom

meaning not known by the client implementation) to be omitted by default.
If the user overrides config.push.allowed.keys with a custom configuration,
however, the KIP would allow that to be included in the configuration that
is pushed to the broker. Does that seem sensible, or no?


Thanks for the feedback!

Kirk


Thanks,
Mickael

On Sun, May 3, 2026 at 5:24 PM Muralidhar Basani via dev
<[email protected]> wrote:


Hi Kirk,

Thanks for this thoughtful kip.

I have a few additional points.

mb-1 : Would there be any recommendation for reference

implementation to

store the client configs ? may be in a compacted topic etc.

Otherwise,

everyone will come up with their impl, and could be re-invented

again and

again?

mb-2 : When a client reconnects, there might be duplicates of

clientconfigs

pushed. Should this be handled by plugin may be based on

uuid/clientid? (I

am not sure, if client restarts and there is a new uuid, would

there be

stale entries indefinitely)

mb-3 : In your reply to Hector, you noted "it's difficult to

ensure that

client is marked as 'in violation' across all brokers in a

cluster." Does

the same challenge apply to the observability goal? If the

PushConfig lands

on a single randomly chosen broker, only that broker's

ClientConfigPolicy

sees it, so the plugin effectively must fan out to shared storage

for the

feature to be useful cluster-wide ?

mb-4 : If AdminClient configs are not handled, how about any

dynamic config

changes like AdminClient.alterClientConfigs would also work like

hot reload

? Or any explicit re-push mechanism be possible?

mb-5 : Regarding configs.push.allowed.keys, would it help if there

is a

deny list which can override allow list? or to avoid any sensitive

keys

just in case.

mb-6 : In general, if the rpc fails for one random node, will it

retry to a

different node?


Thanks,
Murali

On Sun, May 3, 2026 at 1:37 PM Kirk True <[email protected]>

wrote:

Hi Andrew,

On Tue, Apr 28, 2026, at 6:28 PM, Andrew Schofield wrote:

Hi Kirk,
Thanks for the KIP. Inevitably, I have some comments.

AS1: I see a few mysterious mentions of "profile". I suppose

that these

are evidence of a concept which did not eventually see the light

of day in

the KIP.

That's a vestige from a previous design. I will remove them ASAP.

AS2: It seems that the default configurations pushed for a

share

consumer are incorrect (because some of them are group configs

not client

configs). I suggest:


* client.id
* group.id
* share.acknowledgement.mode
* share.acquire.mode
* max.poll.records
* max.poll.interval.ms
* fetch.min.bytes
* fetch.max.wait.ms


Thanks for the feedback!

AS3: Is there any reason why you need to send the

configuration type

information to the broker? I wonder if it would be simpler just

to leave

everything as strings. I see that you've got a enum for

ClientConfigType

which is awfully similar to Config.Type, and you're going to end

up mapping

between these enums.

Most of the configuration we're collecting by default end up

being

numeric, so defaulting to strings is less efficient. Also, the

thought

process was that implementors of the ClientConfigPolicy plugin

may benefit

from/need the type information to achieve their goal.

Agreed that the mapping between ClientConfigType and Config.Type

seems

unnecessary. ClientConfigType is a strict subset of Config.Type,
specifically no PASSWORD type, so I wanted to provide some

compile time

support to prevent their usage. Maybe I was being too zealous in

this

regard?

AS4: ClientConfigUnknownProfileException has a couple of

problems. We

tend to use Unknown at the start of the error codes and

exception class

names, not the middle. Profile is not a thing. I suggest it

should be

UnknownClientConfigSomethingException, but I am not qualified to

say what

the Something is.

I will propose a change in the next revision of the KIP. Thanks

for

catching that.

AS5: You should include a list of the exceptions and error

codes you are

introducing to the protocol. I've seen a few new exceptions and

they

generally have error codes which correspond 1:1.


* ClientConfigUnknownProfileException - Should this just be

InvalidConfigurationException (exists) which maps to

INVALID_CONFIG error

code?

* ClientConfigTooLargeException - I would expect this to map

to the

CLIENT_CONFIG_TOO_LARGE error code (new)

* ClientConfigPolicyException - See AS4. Policy exceptions are

usually

PolicyViolationException, but this policy doesn't validate, it

just

processes.

I wasn't certain where on the specificity vs. generality

spectrum to land

on creating or reusing existing errors. In my next KIP revision

I'll look

to reuse existing error codes where possible.

AS6: You've tended to use Config (singular) not Configs

(plural).

However, in the configurations for the configurations, you've

used plural,

such as client.configs.policy.class.name. I would err on the

side of

consistency.

Agreed. I'll make this more consistent.

AS7: We are going to have to resolve the relationship between

this KIP

and KIP-1313. The latter introduces client instance ID on all

RPCs, and

will simplify your KIP if it is accepted first.

Yes, KIP-1313 are racing in this regard. I see KIP-1313 has

entered the

voting phase, so I will likely end up removing most of the

client ID

instance references from my KIP, leaving on those that are more

pertinent

to the KIP's specific needs.

AS8: Please confirm whether there are any timing

considerations between

PushConfigs and GetTelemetrySubscriptions/PushTelemetry RPCs. I

suppose

that a client could overlap these RPCs, or even send them to

different

randomly selected brokers, as part of its initial connection

setup.


The intention was for the PushConfig RPC to execute before the

telemetry

RPCs. However, this will likely change based on other feedback,

moving

toward more of an "overlap" sequencing. From the perspective of

this KIP,

it shouldn't matter if they're sent to different brokers. I

wonder if the

ClientConfigPolicy plugin implementors would see that

differently?

AS9: Please add some broker metrics for this new feature. I

suggest

looking at KIP-714 for inspiration.

Will do. I intentionally omitted metrics on the broker, leaving

them for

the specific implementations. But I will look at the KIP-714

metrics, as

suggested.

AS10: Why does the client block while the config handshake is

being

performed? The handshake is not validating the configurations

and the

client doesn't throw an exception even if the PushConfigs

response contains

an error code. Doesn't this unnecessarily slow down the initial

connection,

and that's arguably too long with Kafka clients already.

The approach will likely move away from a blocking approach to a
background, best effort approach. I will update the KIP to

reflect this.

AS11: Which are the default configurations sent for the Apache

Kafka

Java admin client?

In initial discussions, the admin client was deemed as too

uninteresting

to warrant sending configuration since it doesn't effect

correctness or

performance of the critical produce/consume path. Do you

disagree?


Thanks for the thorough feedback!

Kirk

Thanks,
Andrew

On 2026/04/23 17:59:11 Kirk True wrote:

Hi all,

I would like to start a discussion on KIP-1324: Support

client

configuration observability:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1324%3A+Support+client+configuration+observability


Thanks,
Kirk

Re: [DISCUSS] KIP-1324: Support client configuration observability

Reply via email to