Thanks for this KIP Andrew.
I am following the discussion for a while already, and it seems there is
a lot of details to consider. So I feel a little bit bad here, as I
might make matters worse.
The KIP does touch on Kafka Streams briefly, but I am wondering if we
should make a larger change? Kafka Streams at this point, treats the
producer/consumer/admin clients as black-boxes, and I would describe
Kafka Streams as "logical client" as it does not maintain its own
network connections but re-used the network connections from the
underlying clients.
To make KIP-714 work for Kafka Streams, we extended the client APIs to
allow registering additional custom metrics (KIP-1076), to be send to
the broker. However, these metrics are reported using the same
`clientInstanceId` as the physical client. As a matter of fact, Kafka
Streams uses a mix of admin client and consumer client (per thread), and
thus reports it's metrics using multiple different `clientInstanceIds`
for the same Kafka Streams instance.
With KIP-1324 and KIP-1331, we also want to collect client side configs,
including Kafka Streams configs, and Topology information, and face a
similar challenge.
Thus, I was thinking if we should assign Kafka Streams clients it's own
clientInstanceId, even if it's not a physical client. If
clientInstanceIds are generated by clients themselves anyway, a Kafka
Streams instance can just generate its own ID (as a matter of face, we
already have a `processId` which is also a UUID).
Having such a UUID at hand, we could use it to send all these different
things (metrics, config, topology) using the same UUID, what would make
it much simpler to correlate broker side. Of course, such a logical
client UUID could not go into the request header, as the underlying
physical client has its own UUID. This might still work, as the KIP
already says SHOULD, not MUST.
To make this work for KIP-714/KIP-1076, we would need to change the
client APIs a little bit, and allow Kafka Streams to pass it's own UUID
into the consumer/admin client to be use to push the Streams metrics. We
would also need to expose the Streams clientInstanceID via the
`ClientInstanceIds` interface.
For KIP-1324 and KIP-1331, we would need similar APIs.
I don't think we can (atm) unify this with KIP-848/1071 member-ID
though, as a Kafka Streams client has multiple threads, and each thread
has its own consumer, ie, there is multiple members per instance.
However, we actually do have plans to refactor Kafka Streams threading
model, and would like to reduce the number of consumers/group-members
per instance to one. When we make this change, we could also unify with
member-id.
Thoughts?
-Matthias
On 5/12/26 10:48 AM, Andrew Schofield wrote:
Hi Jun,
Thanks for the reply and digging into the details.
JR23: Correct. The client telemetry component will use UUID-B as the client
instance ID.
JR23.1: Yes, I agree. It's not ideal. When I was drawing up the tables, I was
thinking that this might be a possibility, but I'm less convinced now. I think
that I should mandate that if a client specifies header.ClientInstanceId on
GetTelemetrySubscriptions request, then request.ClientInstanceId must either be
zero or equal to header.ClientInstanceId.
JR23.2: This is perhaps the interesting one. From its original intent, it
should be UUID-B (the telemetry UUID), but then that contradicts the change in
signature to remove the timeout. Unless I make the change above, in which case
it will be UUID-H.
Thanks,
Andrew
On 2026/05/12 17:23:58 Jun Rao via dev wrote:
Hi, Andrew,
Thanks for the reply.
JR23. In the new client -> old broker case, we have
header.ClientInstanceId=UUID-H
request.ClientInstanceId=UUID-B
response.ClientInstanceId=0
On the server side, I guess the telemetry component will use UUID-B as the
clientInstanceId? This has a couple of implications.
JR23.1 On the server side, we have two different clientInstanceIds used in
different places, UUID-H for request logging and UUID-B in telemetry. This
seems confusing since we can't uniquely identify a client on the server
side.
JR23.2 On the client side. what uuid does clientInstanceId(Duration
timeout) return? If it returns UUID-H, it will be confusing since it
doesn't match the ID used for telemetry on the server.
Jun
On Tue, May 12, 2026 at 12:58 AM Andrew Schofield <[email protected]>
wrote:
Hi Jun,
Thanks for your response.
JR20: I have improved (I hope) the wording. The client sends
request.clientInstanceId = 0 and header.clientInstanceId = UUID-H, and the
broker responds response.clientInstanceId=UUID-H. In this way, the broker
will have taken the UUID-H from the header, and told the client to use it
for client telemetry also.
JR21: Done. Look for "henceforth".
JR22: Summary table added.
Thanks,
Andrew
On 2026/05/11 19:18:24 Jun Rao via dev wrote:
Hi, Andrew,
Thanks for the reply.
JR20. "If the client requests a new client instance ID on its initial
GetTelemetrySubscriptions request and it sends a client instance ID in
the
request header, the broker will send back that client instance ID rather
than generating a new UUID. This will automatically align the UUID in the
request headers and client telemetry."
This seems inconsistent with what's in the table. In the table, for
example, if the client has the following:
GetTelemetrySubscriptions v0
header.ClientInstanceId = UUID-H
request.ClientInstanceId = UUID-H
or
GetTelemetrySubscriptions v0
header.ClientInstanceId = UUID-H
request.ClientInstanceId = UUID-R
the broker returns
response.ClientInstanceId = 0.
JR21. It will be useful to document what the new client does with the
returned response.ClientInstanceId. Note that return value may or may not
be 0.
JR22. It's probably clearer if we could populate the table with 4
combinations: old/new clients with old/new brokers.
Jun
On Fri, May 8, 2026 at 2:49 AM Andrew Schofield <[email protected]>
wrote:
Hi Jun and Chia-Ping,
I've overhauled part of the KIP to do with alignment of the request
header
client instance ID, client telemetry client instance ID and group
protocol
member IDs. The alignment is by convention, not mandate (SHOULD not
MUST).
It would be possible to go around the existing RPCs such as
ConsumerGroupHeartbeat and GetTelemetrySubscriptions, and remove the
fields
containing the existing identifiers which are intended to be aligned.
Doing
so would be a bad idea though, because we would then have RPC versions
which essentially depend upon the presence of a tagged field in the
request
header. This is a protocol-compatibility nightmare.
I have removed the new versions of GetTelemetrySubscriptions and
PushTelemetry. I have also explained the behavior of
GetTelemetrySubscriptions in the presence and absence of a client
instance
ID in the request header.
Let me know what you think.
Thanks,
Andrew
On 2026/05/07 15:09:31 Andrew Schofield wrote:
Hi Jun and Chia-Ping,
I've been thinking and discussing the changes to the KIP-714 RPCs.
There
are too many combinations for my liking at the moment. I want to take
another pass at this area and will make an update in a few days.
I intend to start a new vote once we have consensus because the spec
has
changed somewhat since the earliest votes.
Thanks,
Andrew
On 2026/05/06 17:28:27 Chia-Ping Tsai wrote:
hi Andrew
chia_0: If the consensus is to remove the "duplicate" field from
the
RPC payloads, the tagged field in the header will essentially become a
required field. This means the broker needs to handle the edge case
where
both the header and the request body have no ClientInstanceId, right?
If
so, would you mind clarifying the expected broker behavior in the KIP?
Best,
Chia-Ping
On 2026/04/03 16:17:37 Andrew Schofield wrote:
Hi,
I would like to start the discussion on KIP-1313. This adds a
unique
client instance ID to the request header of all Kafka protocol
requests to
give a unique identifier which can be used to correlate the requests
from
each client for the purposes of problem determination.
https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1313*3A*Client*instance*ID*in*all*request*headers__;JSsrKysrKys!!Ayb5sqE7!uqWf0-b_X82WmpmCYImD2W2rht_s_q5vHcqB9ToMV4IaeQbZF42eMJyS5XC5b5qE_qJJUj3KTCXcqEvYbwYS$
Thanks,
Andrew