Thanks for the KIP Kirk,

I did not find time yet, to read the KIP in full detail, but I would like to follow up on Mickael's question about Streams and Connect. Also, the relationship to KIP-1313 (and KIP-714, KIP-1076) is something I would like to ask about.

Starting with Connect, I am frankly not sure if I agree with Mickael? While Connect is a "client" from a broker POV, Connect is still its own cluster. So I would assume that a Connect cluster is run by the same operator that also runs the broker cluster, and users would submit their Connectors into the cluster. Thus, I am wondering, if any such config monitoring should go into the Connect framework directly? -- Sending configs to the brokers is the only way (and to some extend a workaround IMHO) to do this monitoring for "actual clients" (producer/consumer/streams), but Connect might be different, and I would assume it might actually be much simpler to do the monitoring directly inside the Connect framework?


I think it's great the we include Streams in the KIP right away. For KIP-714, this did not happen, and we needed to do KIP-1076 as a follow up. However, currently the KIP only mentioned high level that Streams would get a new config to enable/disable the config push. It's not clear how this push would actually happen? Atm, Streams treats the clients more or less as black-boxes (with the exception of KIP-1071...). For KIP-714, we extended the clients API (via KIP-1076) allowing to push custom metric via the underlying clients. For metrics collection, we actually use a mix of consumer and admin client inside Streams to send the different metrics. The disadvantage is still, that the Streams metrics use the same clientInstanceId as the consumer/admin client we are piggy-bagging on. This might be fine for metrics: the subscription is dynamic, and thus any plugin would accumulate all reported values. For collecting configs, it's a single hand-shake between the client and the broker though, so it seems this different.

While we could maybe do a similar thing as for KIP-714/KIP-1076, we would still need a public API (on the admin client) to allow Streams to hand its own configs to the admin client for a push -- but if the admin client used its own clientInstanceId, we would get two `PushConfigRequest` on the same connection what seems to go against the design of the KIP. I am also worried to "confuse" the broker side plugin, and a second on the same clientInstanceId would overwrite the previous one? -- Or, as an alternative we would need to give the Streams config to the admin client before it connects to the broker, what does also seem to be tricky to get right. Also, we get a config mix on the same clientInstanceId what does not sound ideal either.

With regard to KIP-1313, I was actually thinking if we should introduce a "logical client" concept, and allow a "logical client" like Streams (ie, doesn't open its own network connection), to still get it's own clientInstanceId? If we follow KIP-1313 and clients generate their own UUIDs, it would be easy for Streams to also generate its own clientInstanceId. This might allow us much more easily to send two `PushConfigRequest` with two different `clientInstanceIds`. -- Btw: I would believe that other "logical clients / thrid-party frameworks" like Apache Flink or Apache Spark, might also benefit?

Furthermore, if we also change KIP-714 and let the client generate its UUID for metrics push, it would be good to use the same UUID for both pushing Streams metrics, and the Streams config. KIP-1331 from Lucas, could also benefit from a unique logical clienstInstanceId for Streams. Of course, but this is more a KIP-1313 discussion, if we go this route, we might need a few more changes. But if we get a better overall solution, and can unify different concepts it seems to be worth it.


-Matthias




On 5/12/26 2:18 PM, Apoorv Mittal wrote:
Hi Kirk,
Thanks for the KIP. I have some questions:

AM1. The KIP in the request-response flow diagram mentions, "Validate
sensitive configs were excluded". Can you please help me understand what
happens if the sensitive configs are somehow determined by
ClientConfigPolicy? Has the client been notified, or what's the expected
behaviour next?

AM2. The KIP defines default configuration keys for which data will be
transmitted. Also there is a client configuration that can override which
keys to send. If a new client configuration is added in the future, we need
to decide whether to include it in the default set, correct? I am
confirming to determine whether a deny list instead of an allow list would
be helpful, however, the allow list seems less error-prone when
transmitting client-related configurations.

AM3. Unlike KIP-714 where the broker determines which metrics the client
receives, and the subscription can change dynamically for the client. This
KIP has a default set present in the client; therefore, should we mention
why some configs like connections.max.idle.ms, max.block.ms,
max.request.size,metadata.max.age.ms, etc. are not in the default set for
Java producer? Similarly, this applies to the Consumer and Share Consumer
segments. Expecting clients to define all configuration keys for
transmission in properties is cumbersome; therefore, I suggest using a
broader default list.

AM4. The KIP defines new exception ClientConfigTooLargeException when the
configuration size is exceeded. What will the client do when this exception
is encountered? Also the KIP mentions "But as a backup means of preventing
the client from sending too much data, the broker checks the new
configuration client.config.max.bytes prior to invoking the policy", since
the check is happening on broker hence the client has anyways sent the
large payload hence how do we prevent "the client from sending too much
data"?

AM5. The KIP defines IsDefault as the parameter for Config. It's unclear
why this is required since the broker will get the respective config value
anyways. Can you please help me understand?

AM6. The KIP defines instance-count as a client metric. Can you please
detail what value this metric will serve as I expect the client will invoke
PushConfigRequest only once in its instantiated lifetime i.e what is the
usage of the metric?

AM7. Do you think it's reasonable for the broker to also know which client
type is pushing the respective configuration (e.g., producer, consumer,
streams, etc.)? I understand the operator can determine this by looking at
different metrics or RPC calls invoked by client id but if
ClientConfigPolicy wishes to enforce or validate some configurations, the
source might be relevant. Just writing this down for further discussion.

Regards,
Apoorv Mittal


On Thu, May 7, 2026 at 5:06 PM Mickael Maison <[email protected]>
wrote:

Hi,

MM1: Thanks

MM2: The javadoc for ClientConfigPolicy still says "An interface for
intercepting and enforcing client configuration".

MM3: It does not quite answer my question. Are Streams applications
configurations sent to brokers? If it is then we should do it for
Connect too.

MM4: That seems fine.

Thanks,
Mickael

On Tue, May 5, 2026 at 9:33 PM Andrew Schofield <[email protected]>
wrote:

Hi Kirk,
Thanks for your response.

AS11: I tend to agree that admin clients have less interesting
configuration. But the KIP is a specification of the behaviour, so I think
it should say, especially since it does make a few references to admin
clients such as having the config to disable config push. If you do choose
to have some default configs for admin clients, client.id is the obvious
one because it's pushed by all of the other kinds of clients.

AS12: Building on MM3, maybe we should have a compoundClientId for Kafka
Streams, Kafka Connect, and so on. I avoided suggesting clientGroupId and
groupInstanceId because of the opportunity for confusion with existing
concepts :)

Thanks,
Andrew

On 2026/05/05 15:09:46 Kirk True wrote:
Hi Mickael,

On Mon, May 4, 2026, at 12:10 PM, Mickael Maison wrote:
Hi,

Thanks for the KIP, I have a few questions:

MM1: Can we have the definition for the new Exception classes? I'm
particularly interested in the exception for INVALID_CONFIG, how are
the failures returned to the client? From PushConfigResponse it seems
it's just a string? Should it be nullable collection with a specific
message for each issue?

I've added the definition for the ConfigTooLargeException which is now
the only exception added.

MM2: The new broker class is presented as a policy but if I
understand
correctly in case the process method throws and PushConfigResponse
has
a non-NONE error code, the client still continues. This is different
from the other policy classes AlterConfigPolicy, CreateTopicPolicy
which prevent the action in case of a violation. Are we aiming for
validation, observability or both?

We are focused on observability in this KIP, but as you point out, the
policy moniker doesn't make sense in this context.

MM3: It seems you're treating Streams as a separate client instead of
an aggregation of Producers and Consumers (I see StreamsConfig in
Configuration Payload Size Enforcement). What about Connect?

Clients such as Kafka Streams, Connect, etc. are tricky because it's
important to provide context to the embedded client. For example, a
consumer used in a Connect sink would benefit from having that context so
that downstream filtering, aggregation, etc. can take that into account.

LMK if that answers your question or not.

MM4: Regarding sensitive configurations, the current criteria would
treat custom configurations as non sensitive. For example if my
producer has a custom serializer it may have its own custom
configurations too. Is that behavior what you wanted? If so let's me
it clear. By precaution, I'd lean towards considering custom configs
as sensitive as the client has no idea what it is.

The intention of the KIP is for any custom configuration (custom
meaning not known by the client implementation) to be omitted by default.
If the user overrides config.push.allowed.keys with a custom configuration,
however, the KIP would allow that to be included in the configuration that
is pushed to the broker. Does that seem sensible, or no?

Thanks for the feedback!

Kirk


Thanks,
Mickael

On Sun, May 3, 2026 at 5:24 PM Muralidhar Basani via dev
<[email protected]> wrote:

Hi Kirk,

Thanks for this thoughtful kip.

I have a few additional points.

mb-1 : Would there be any recommendation for reference
implementation to
store the client configs ? may be in a compacted topic etc.
Otherwise,
everyone will come up with their impl, and could be re-invented
again and
again?

mb-2 : When a client reconnects, there might be duplicates of
clientconfigs
pushed. Should this be handled by plugin may be based on
uuid/clientid? (I
am not sure, if client restarts and there is a new uuid, would
there be
stale entries indefinitely)

mb-3 : In your reply to Hector, you noted "it's difficult to
ensure that
client is marked as 'in violation' across all brokers in a
cluster." Does
the same challenge apply to the observability goal? If the
PushConfig lands
on a single randomly chosen broker, only that broker's
ClientConfigPolicy
sees it, so the plugin effectively must fan out to shared storage
for the
feature to be useful cluster-wide ?

mb-4 : If AdminClient configs are not handled, how about any
dynamic config
changes like AdminClient.alterClientConfigs would also work like
hot reload
? Or any explicit re-push mechanism be possible?

mb-5 : Regarding configs.push.allowed.keys, would it help if there
is a
deny list which can override allow list? or to avoid any sensitive
keys
just in case.

mb-6 : In general, if the rpc fails for one random node, will it
retry to a
different node?


Thanks,
Murali

On Sun, May 3, 2026 at 1:37 PM Kirk True <[email protected]>
wrote:

Hi Andrew,

On Tue, Apr 28, 2026, at 6:28 PM, Andrew Schofield wrote:
Hi Kirk,
Thanks for the KIP. Inevitably, I have some comments.

AS1: I see a few mysterious mentions of "profile". I suppose
that these
are evidence of a concept which did not eventually see the light
of day in
the KIP.

That's a vestige from a previous design. I will remove them ASAP.

AS2: It seems that the default configurations pushed for a
share
consumer are incorrect (because some of them are group configs
not client
configs). I suggest:

* client.id
* group.id
* share.acknowledgement.mode
* share.acquire.mode
* max.poll.records
* max.poll.interval.ms
* fetch.min.bytes
* fetch.max.wait.ms

Thanks for the feedback!

AS3: Is there any reason why you need to send the
configuration type
information to the broker? I wonder if it would be simpler just
to leave
everything as strings. I see that you've got a enum for
ClientConfigType
which is awfully similar to Config.Type, and you're going to end
up mapping
between these enums.

Most of the configuration we're collecting by default end up
being
numeric, so defaulting to strings is less efficient. Also, the
thought
process was that implementors of the ClientConfigPolicy plugin
may benefit
from/need the type information to achieve their goal.

Agreed that the mapping between ClientConfigType and Config.Type
seems
unnecessary. ClientConfigType is a strict subset of Config.Type,
specifically no PASSWORD type, so I wanted to provide some
compile time
support to prevent their usage. Maybe I was being too zealous in
this
regard?

AS4: ClientConfigUnknownProfileException has a couple of
problems. We
tend to use Unknown at the start of the error codes and
exception class
names, not the middle. Profile is not a thing. I suggest it
should be
UnknownClientConfigSomethingException, but I am not qualified to
say what
the Something is.

I will propose a change in the next revision of the KIP. Thanks
for
catching that.

AS5: You should include a list of the exceptions and error
codes you are
introducing to the protocol. I've seen a few new exceptions and
they
generally have error codes which correspond 1:1.

* ClientConfigUnknownProfileException - Should this just be
InvalidConfigurationException (exists) which maps to
INVALID_CONFIG error
code?
* ClientConfigTooLargeException - I would expect this to map
to the
CLIENT_CONFIG_TOO_LARGE error code (new)
* ClientConfigPolicyException - See AS4. Policy exceptions are
usually
PolicyViolationException, but this policy doesn't validate, it
just
processes.

I wasn't certain where on the specificity vs. generality
spectrum to land
on creating or reusing existing errors. In my next KIP revision
I'll look
to reuse existing error codes where possible.

AS6: You've tended to use Config (singular) not Configs
(plural).
However, in the configurations for the configurations, you've
used plural,
such as client.configs.policy.class.name. I would err on the
side of
consistency.

Agreed. I'll make this more consistent.

AS7: We are going to have to resolve the relationship between
this KIP
and KIP-1313. The latter introduces client instance ID on all
RPCs, and
will simplify your KIP if it is accepted first.

Yes, KIP-1313 are racing in this regard. I see KIP-1313 has
entered the
voting phase, so I will likely end up removing most of the
client ID
instance references from my KIP, leaving on those that are more
pertinent
to the KIP's specific needs.

AS8: Please confirm whether there are any timing
considerations between
PushConfigs and GetTelemetrySubscriptions/PushTelemetry RPCs. I
suppose
that a client could overlap these RPCs, or even send them to
different
randomly selected brokers, as part of its initial connection
setup.

The intention was for the PushConfig RPC to execute before the
telemetry
RPCs. However, this will likely change based on other feedback,
moving
toward more of an "overlap" sequencing. From the perspective of
this KIP,
it shouldn't matter if they're sent to different brokers. I
wonder if the
ClientConfigPolicy plugin implementors would see that
differently?

AS9: Please add some broker metrics for this new feature. I
suggest
looking at KIP-714 for inspiration.

Will do. I intentionally omitted metrics on the broker, leaving
them for
the specific implementations. But I will look at the KIP-714
metrics, as
suggested.

AS10: Why does the client block while the config handshake is
being
performed? The handshake is not validating the configurations
and the
client doesn't throw an exception even if the PushConfigs
response contains
an error code. Doesn't this unnecessarily slow down the initial
connection,
and that's arguably too long with Kafka clients already.

The approach will likely move away from a blocking approach to a
background, best effort approach. I will update the KIP to
reflect this.

AS11: Which are the default configurations sent for the Apache
Kafka
Java admin client?

In initial discussions, the admin client was deemed as too
uninteresting
to warrant sending configuration since it doesn't effect
correctness or
performance of the critical produce/consume path. Do you
disagree?

Thanks for the thorough feedback!

Kirk

Thanks,
Andrew

On 2026/04/23 17:59:11 Kirk True wrote:
Hi all,

I would like to start a discussion on KIP-1324: Support
client
configuration observability:



https://cwiki.apache.org/confluence/display/KAFKA/KIP-1324%3A+Support+client+configuration+observability

Thanks,
Kirk








Reply via email to