Hi all,

I'd like to raise a discussion around *[KAFKA-19070]*(
https://issues.apache.org/jira/browse/KAFKA-19070), which proposes a small
but impactful change to how Kafka Connect handles the `client.id`
configuration when set explicitly by the user.

*Problem*

Currently, if a user sets a custom `client.id` via the connector
configuration (e.g., `client.id=custom-id`), this value is inherited
**as-is** by **all tasks** of the connector. While this does not break
functionality, it leads to a few critical issues:

- *Metric registration conflicts*: Since Kafka metrics (like
`consumer-metrics`, `fetch-manager-metrics`, etc.) use `client.id` as part
of their identity, using the same ID across multiple tasks causes metrics
to collide or get overwritten.
- *Observability and debugging challenges*: Logs and metrics are merged
across tasks, making it harder to trace task-specific behavior.

*Proposal*

The proposed change (PR [#19341](https://github.com/apache/kafka/pull/19341))
appends the *task number* to the user-provided `client.id` to ensure
uniqueness per task.

For example:
- User configures: `client.id=my-sink`
- Task 2 uses: `client.id=my-sink-2`

*This approach:*
- Respects the user's intent
- Guarantees uniqueness
- Brings Connect behavior in line with Kafka Streams and other components
that generate per-client IDs

*Another approach:*

There is another alternate way of ensuring unique client-id for each task
as follows.

The configuration we provide through the POST/PUT API is the connector-level
config. However, the task-level config is generated by the Connect
framework via the

kafka/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java
<https://github.com/apache/kafka/blob/2a7457f2dd95f0732562ae0708b5162e8c4a3a6d/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java#L124>

Line 124 in 2a7457f
<https://github.com/apache/kafka/commit/2a7457f2dd95f0732562ae0708b5162e8c4a3a6d>
public abstract List<Map<String, String>> taskConfigs(int maxTasks);
method. This method returns a list of configs—one per task—which are then
used to instantiate the tasks.

Since we want each task to have a unique client-id, we can modify the value
at this point (inside taskConfigs(...)) by appending the task number.

*So as part of this approach we can provide a default implementation for
the taskConfigs method which adds the task number in the end to avoid
client-id collision.*
*Potential Concern*

One concern raised is that some users may rely on `client.id` for
**authorization** or **quotas**. In such cases, appending the task number
might introduce unexpected behavior, even though `client.id` is not used
for partitioning, delivery, or offset tracking.

*Questions*

- Would this change qualify as a *bug fix*, or does it require a *KIP* due
to its potential impact?
- Are there known real-world scenarios where modifying `client.id` breaks
compatibility?
- Would it be better to make this opt-in via a config flag?

Happy to revise the implementation depending on community feedback. Thanks
in advance for your thoughts!

Best regards,
Pritam Kumar Mishra

Reply via email to